Supp Material for Non-Luminous Ostracod *Skogsbergia sp.*

The R Juypter Notebook serves as a comprehensive repository encompasing most of the scripts and figures relevant to the non-bioluminescent ostracod Skogsbergia sp. analyses outlined in the publication. This notebook includes the following analyses: QC steps, Differential Gene Expression (DGE), GO enrichments, Cross-Species DGE analysis, WGCNA and conservation of BCN orthologs.

Author: Lisa Yeter Mesrop

Load libraries

In [2]:
#load libraries 
library(tidyverse) 
library(edgeR)
library(matrixStats)
library(DESeq2)
library(dplyr)
library(readxl)
library(data.table)
library(ggplot2)
library(WGCNA)
library(VennDiagram)
library(purrr)
library(reshape2)
library(knitr)
library(RColorBrewer)
library(pheatmap)
library(topGO)
library(ggvenn)

#always use the following WGCNA functions 
options(stringsAsFactors = FALSE);
enableWGCNAThreads();
allowWGCNAThreads(nThreads = 22)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

 ggplot2 3.3.5      purrr   1.0.1
 tibble  3.1.6      dplyr   1.0.2
 tidyr   1.1.2      stringr 1.4.0
 readr   1.3.1      forcats 0.5.0

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
 dplyr::filter() masks stats::filter()
 dplyr::lag()    masks stats::lag()

Warning message:
“package ‘edgeR’ was built under R version 3.6.2”
Loading required package: limma


Attaching package: ‘matrixStats’


The following object is masked from ‘package:dplyr’:

    count


Loading required package: S4Vectors

Loading required package: stats4

Loading required package: BiocGenerics

Loading required package: parallel


Attaching package: ‘BiocGenerics’


The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB


The following object is masked from ‘package:limma’:

    plotMA


The following objects are masked from ‘package:dplyr’:

    combine, intersect, setdiff, union


The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs


The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min



Attaching package: ‘S4Vectors’


The following objects are masked from ‘package:dplyr’:

    first, rename


The following object is masked from ‘package:tidyr’:

    expand


The following object is masked from ‘package:base’:

    expand.grid


Loading required package: IRanges


Attaching package: ‘IRanges’


The following objects are masked from ‘package:dplyr’:

    collapse, desc, slice


The following object is masked from ‘package:purrr’:

    reduce


Loading required package: GenomicRanges

Loading required package: GenomeInfoDb

Loading required package: SummarizedExperiment

Loading required package: Biobase

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.



Attaching package: ‘Biobase’


The following objects are masked from ‘package:matrixStats’:

    anyMissing, rowMedians


Loading required package: DelayedArray

Loading required package: BiocParallel


Attaching package: ‘DelayedArray’


The following objects are masked from ‘package:matrixStats’:

    colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges


The following object is masked from ‘package:purrr’:

    simplify


The following objects are masked from ‘package:base’:

    aperm, apply, rowsum



Attaching package: ‘data.table’


The following object is masked from ‘package:SummarizedExperiment’:

    shift


The following object is masked from ‘package:GenomicRanges’:

    shift


The following object is masked from ‘package:IRanges’:

    shift


The following objects are masked from ‘package:S4Vectors’:

    first, second


The following objects are masked from ‘package:dplyr’:

    between, first, last


The following object is masked from ‘package:purrr’:

    transpose


Loading required package: dynamicTreeCut

Loading required package: fastcluster


Attaching package: ‘fastcluster’


The following object is masked from ‘package:stats’:

    hclust



Attaching package: ‘WGCNA’


The following object is masked from ‘package:IRanges’:

    cor


The following object is masked from ‘package:S4Vectors’:

    cor


The following object is masked from ‘package:stats’:

    cor


Loading required package: grid

Loading required package: futile.logger


Attaching package: ‘reshape2’


The following objects are masked from ‘package:data.table’:

    dcast, melt


The following object is masked from ‘package:tidyr’:

    smiths


Loading required package: graph


Attaching package: ‘graph’


The following object is masked from ‘package:stringr’:

    boundary


Loading required package: GO.db

Loading required package: AnnotationDbi


Attaching package: ‘AnnotationDbi’


The following object is masked from ‘package:dplyr’:

    select


Loading required package: SparseM


Attaching package: ‘SparseM’


The following object is masked from ‘package:base’:

    backsolve



groupGOTerms: 	GOBPTerm, GOMFTerm, GOCCTerm environments built.


Attaching package: ‘topGO’


The following object is masked from ‘package:grid’:

    depth


The following object is masked from ‘package:IRanges’:

    members


Allowing parallel execution with up to 39 working processes.
Allowing multi-threading with up to 22 threads.

Load data

Import Skogsbergia sp. gene expression matrix which consists of three tissue types - upper lip, compound eye and gut - with five biological replicates for each tissue. Generate the sample name sheet (meta sheet) for downstream DESeq2 analyses. Read in the annotation file for Skogsbergia sp. generated by Trinotate.

In [3]:
#read in gene expression matrix 
skogs_counts <- read.delim("skogs_fasta90_isoform_combined.tab", header = TRUE, sep = "\t", quote = "")
In [4]:
head(skogs_counts)
A data.frame: 6 × 17
XSk.10A_fasta90_isoform.counts.tabSk.10B_fasta90_isoform.counts.tabSk.10C_fasta90_isoform.counts.tabSk.6A_fasta90_isoform.counts.tabSk.6B_fasta90_isoform.counts.tabSk.6C_fasta90_isoform..counts.tabSk.7A_fasta90_isoform.counts.tabSk.7B_fasta90_isoform.counts.tabSk.7C_fasta90_isoform.counts.tabSk.8A_fasta90_isoform.counts.tabSk.8B_fasta90_isoform..counts.tabSk.8C_fasta90_isoform..counts.tabSk.9A_fasta90_isoform.counts.tabSk.9B_fasta90_isoform..counts.tabSk.9C_fasta90_isoform.counts.tabX.1
<chr><int><int><int><int><int><int><int><int><int><int><int><int><int><int><int><lgl>
1TRINITY_DN0_c0_g1_i2 0145700 1 75515 320113 27141 5 5 533NA
2TRINITY_DN0_c0_g3_i1 6 1 68618 19513 2 32 6 3 31 3 0 68NA
3TRINITY_DN0_c0_g4_i2 2269289 0169932136847419450613971367NA
4TRINITY_DN100000_c0_g1_i10 0 00 0 0 3 1 1 8 0 6 0 0 0NA
5TRINITY_DN100005_c0_g1_i17 5 100 4 610 2 5 6 1 10 510 7NA
6TRINITY_DN100007_c0_g1_i10 0 44 0 4 1 0 9 0 4 7 4 0 3NA
In [5]:
#fix the column names 
row.names(skogs_counts) <-skogs_counts$X
In [6]:
head(skogs_counts)
A data.frame: 6 × 17
XSk.10A_fasta90_isoform.counts.tabSk.10B_fasta90_isoform.counts.tabSk.10C_fasta90_isoform.counts.tabSk.6A_fasta90_isoform.counts.tabSk.6B_fasta90_isoform.counts.tabSk.6C_fasta90_isoform..counts.tabSk.7A_fasta90_isoform.counts.tabSk.7B_fasta90_isoform.counts.tabSk.7C_fasta90_isoform.counts.tabSk.8A_fasta90_isoform.counts.tabSk.8B_fasta90_isoform..counts.tabSk.8C_fasta90_isoform..counts.tabSk.9A_fasta90_isoform.counts.tabSk.9B_fasta90_isoform..counts.tabSk.9C_fasta90_isoform.counts.tabX.1
<chr><int><int><int><int><int><int><int><int><int><int><int><int><int><int><int><lgl>
TRINITY_DN0_c0_g1_i2TRINITY_DN0_c0_g1_i2 0145700 1 75515 320113 27141 5 5 533NA
TRINITY_DN0_c0_g3_i1TRINITY_DN0_c0_g3_i1 6 1 68618 19513 2 32 6 3 31 3 0 68NA
TRINITY_DN0_c0_g4_i2TRINITY_DN0_c0_g4_i2 2269289 0169932136847419450613971367NA
TRINITY_DN100000_c0_g1_i1TRINITY_DN100000_c0_g1_i10 0 00 0 0 3 1 1 8 0 6 0 0 0NA
TRINITY_DN100005_c0_g1_i1TRINITY_DN100005_c0_g1_i17 5 100 4 610 2 5 6 1 10 510 7NA
TRINITY_DN100007_c0_g1_i1TRINITY_DN100007_c0_g1_i10 0 44 0 4 1 0 9 0 4 7 4 0 3NA
In [7]:
#remove the column X and extra X.1 
skogs_counts$X.1 <- NULL 
skogs_counts$X <- NULL
In [8]:
head(skogs_counts)
A data.frame: 6 × 15
Sk.10A_fasta90_isoform.counts.tabSk.10B_fasta90_isoform.counts.tabSk.10C_fasta90_isoform.counts.tabSk.6A_fasta90_isoform.counts.tabSk.6B_fasta90_isoform.counts.tabSk.6C_fasta90_isoform..counts.tabSk.7A_fasta90_isoform.counts.tabSk.7B_fasta90_isoform.counts.tabSk.7C_fasta90_isoform.counts.tabSk.8A_fasta90_isoform.counts.tabSk.8B_fasta90_isoform..counts.tabSk.8C_fasta90_isoform..counts.tabSk.9A_fasta90_isoform.counts.tabSk.9B_fasta90_isoform..counts.tabSk.9C_fasta90_isoform.counts.tab
<int><int><int><int><int><int><int><int><int><int><int><int><int><int><int>
TRINITY_DN0_c0_g1_i20145700 1 75515 320113 27141 5 5 533
TRINITY_DN0_c0_g3_i16 1 68618 19513 2 32 6 3 31 3 0 68
TRINITY_DN0_c0_g4_i22269289 0169932136847419450613971367
TRINITY_DN100000_c0_g1_i10 0 00 0 0 3 1 1 8 0 6 0 0 0
TRINITY_DN100005_c0_g1_i17 5 100 4 610 2 5 6 1 10 510 7
TRINITY_DN100007_c0_g1_i10 0 44 0 4 1 0 9 0 4 7 4 0 3
In [9]:
meta <- data.frame(row.names = colnames(skogs_counts))
In [10]:
head(meta)
A data.frame: 6 × 0
Sk.10A_fasta90_isoform.counts.tab
Sk.10B_fasta90_isoform.counts.tab
Sk.10C_fasta90_isoform.counts.tab
Sk.6A_fasta90_isoform.counts.tab
Sk.6B_fasta90_isoform.counts.tab
Sk.6C_fasta90_isoform..counts.tab
In [11]:
sample_name = c("Upper_lip", "Eye", "Gut", "Upper_lip", "Eye", "Gut", "Upper_lip", "Eye", "Gut", "Upper_lip", "Eye", "Gut","Upper_lip", "Eye", "Gut")
In [12]:
meta$sample_name <- sample_name
In [13]:
meta$names <- rownames(meta)
In [14]:
rownames(meta) <- NULL
In [15]:
meta
A data.frame: 15 × 2
sample_namenames
<chr><chr>
Upper_lipSk.10A_fasta90_isoform.counts.tab
Eye Sk.10B_fasta90_isoform.counts.tab
Gut Sk.10C_fasta90_isoform.counts.tab
Upper_lipSk.6A_fasta90_isoform.counts.tab
Eye Sk.6B_fasta90_isoform.counts.tab
Gut Sk.6C_fasta90_isoform..counts.tab
Upper_lipSk.7A_fasta90_isoform.counts.tab
Eye Sk.7B_fasta90_isoform.counts.tab
Gut Sk.7C_fasta90_isoform.counts.tab
Upper_lipSk.8A_fasta90_isoform.counts.tab
Eye Sk.8B_fasta90_isoform..counts.tab
Gut Sk.8C_fasta90_isoform..counts.tab
Upper_lipSk.9A_fasta90_isoform.counts.tab
Eye Sk.9B_fasta90_isoform..counts.tab
Gut Sk.9C_fasta90_isoform.counts.tab

QC for downstream analyses

In [16]:
#import count table, meta and sample_name into a DESeq2 object
dds_count_table <- DESeqDataSetFromMatrix(countData = skogs_counts, colData = meta, design = ~sample_name)
Warning message in DESeqDataSet(se, design = design, ignoreRank):
“some variables in design formula are characters, converting to factors”
In [17]:
nrow(counts(dds_count_table))
81092
In [18]:
#the number of prefiltered counts for each sample
colSums(assay(dds_count_table))
Sk.10A_fasta90_isoform.counts.tab
754726
Sk.10B_fasta90_isoform.counts.tab
940112
Sk.10C_fasta90_isoform.counts.tab
2224624
Sk.6A_fasta90_isoform.counts.tab
854632
Sk.6B_fasta90_isoform.counts.tab
464466
Sk.6C_fasta90_isoform..counts.tab
2614268
Sk.7A_fasta90_isoform.counts.tab
1114111
Sk.7B_fasta90_isoform.counts.tab
1087820
Sk.7C_fasta90_isoform.counts.tab
1927494
Sk.8A_fasta90_isoform.counts.tab
830645
Sk.8B_fasta90_isoform..counts.tab
1065826
Sk.8C_fasta90_isoform..counts.tab
1999424
Sk.9A_fasta90_isoform.counts.tab
823102
Sk.9B_fasta90_isoform..counts.tab
958184
Sk.9C_fasta90_isoform.counts.tab
2480345

Barplot of counts for all samples

The counts of raw reads were examined using a bar plot.

In [19]:
#visualize prefiltered raw counts in a barplot  

librarySizes <- colSums(assay(dds_count_table))

par(mar=c(10,5,2,2))  
barplot(librarySizes, 
        las=2, 
        cex.names=.5,
        main="Barplot of raw count distributions of samples")

Visualize transformed expression matrix with hierarchical clustering and PCA

In [20]:
# run DESeq2 function and normalization  
dds_raw_counts <- DESeq(dds_count_table, betaPrior = FALSE, parallel = TRUE) 
# Perform a variance-stabilizing transformation
vsd_raw_counts <- varianceStabilizingTransformation(dds_raw_counts)
estimating size factors

estimating dispersions

gene-wise dispersion estimates: 38 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 38 workers

In [21]:
sampleDists_raw_counts <- dist(t(assay(vsd_raw_counts)))
In [22]:
#plot the heatmap
sampleDists_raw_counts_Matrix <- as.matrix(sampleDists_raw_counts)
rownames(sampleDists_raw_counts_Matrix) <- paste(colData(dds_raw_counts)$sample_name) 
colnames(sampleDists_raw_counts_Matrix) <- colData(dds_raw_counts)$names
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDists_raw_counts_Matrix,
          clustering_distance_rows=sampleDists_raw_counts,
         clustering_distance_cols=sampleDists_raw_counts,
         col=colors)
In [25]:
#plot the PCA
pcaData_raw_counts <- plotPCA(vsd_raw_counts, intgroup="sample_name", returnData=TRUE)
percentVar_raw_counts <- round(100 * attr(pcaData_raw_counts, "percentVar"))
ggplot(pcaData_raw_counts, aes(PC1, PC2, color=sample_name, shape=sample_name)) +
  geom_point(size=5) +
  xlab(paste0("PC1: ",percentVar_raw_counts[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar_raw_counts[2],"% variance")) + 
   geom_point(size=5) +
theme(panel.grid.major = element_line(colour = "gray97",  size = 1), panel.grid.minor = element_line(linetype = "dotted"), panel.background = element_rect(fill = NA), 
   legend.key = element_rect(fill = "gray100")) + theme(axis.line = element_line(size = 0.5,linetype = "solid")) + 
  theme(panel.border = element_rect(colour = "black", fill=NA, size=1)) +
coord_fixed() +
scale_color_manual(values = c('#F2C93D','#C97D97','#F1AFB4'))+theme(
    legend.title = element_text(size = 16),
    legend.text = element_text(size = 14),  
    axis.title.x = element_text(size = 16),  
    axis.title.y = element_text(size = 16),  
    axis.text = element_text(size = 12)  
  )
In [24]:
options(repr.plot.width=8, repr.plot.height=6, repr.plot.res = 150)

Import the annotation for Skogsbergia sp. transcriptome

In [27]:
#read in the Trinotate sheet for Skogsbergia sp. 

Trinotate_lym_subset_skogs <- read.csv(file = "Trinotate_lym_subset_Skogsbergia_cdhit90_longestisoform.csv")

Prep expression matrix for WGCNA

As a preliminary quality control (QC) measure for WGCNA analysis, the overall similarity between samples and transcripts with low counts was assessed, as these counts often introduce noise in co-expression analyses. A filter was applied to the expression matrix, removing transcripts with fewer than 5 counts in more than 5 samples, given that some sample types included a minimum of 5 biological replicates. The QC analyses utilized functions from the DESeq2 package (Love et al., 2014). The WGCNA analysis is used for construction of co-expression networks for Skogsbergia sp. and conservation of BCN orthologs in the networks of the non-luminous Skogsbergia sp. in Section 7.

In [28]:
#import count table into a DESeq2 
dds_count_table_wgcna <- DESeqDataSetFromMatrix(countData = skogs_counts, colData = meta, design = ~sample_name)
Warning message in DESeqDataSet(se, design = design, ignoreRank):
“some variables in design formula are characters, converting to factors”
In [29]:
#filter the count table 
dds_merged_table_prefiltered_wgcna <- dds_count_table_wgcna[rowSums(counts(dds_count_table_wgcna) >= 5) >=5,];
nrow(dds_merged_table_prefiltered_wgcna
25909
In [30]:
# run DESeq2 function and normalization  
dds_prefiltered_wgcna <- DESeq(dds_merged_table_prefiltered_wgcna, betaPrior = FALSE, parallel = TRUE) ;#make sure that these are default parameters, betaPrior and parallel
# Perform a variance-stabilizing transformation
vsd_prefiltered_wgcna <- varianceStabilizingTransformation(dds_prefiltered_wgcna)
estimating size factors

estimating dispersions

gene-wise dispersion estimates: 38 workers

mean-dispersion relationship

final dispersion estimates, fitting model and testing: 38 workers

In [32]:
#transpose the matrix
sampleDists_wgcna <- dist(t(assay(vsd_prefiltered_wgcna)))
In [33]:
#plot the heatmap
sampleDistMatrix_wgcna <- as.matrix(sampleDists_wgcna)
rownames(sampleDistMatrix_wgcna) <- paste(colData(dds_merged_table_prefiltered_wgcna)$sample_name) 
colnames(sampleDistMatrix_wgcna) <- colData(dds_merged_table_prefiltered_wgcna)$names
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix_wgcna,
          clustering_distance_rows=sampleDists_wgcna,
         clustering_distance_cols=sampleDists_wgcna,
         col=colors)
In [34]:
#plot the PCA 

pcaData_prefiltered_wgcna <- plotPCA(vsd_prefiltered_wgcna, intgroup="sample_name", returnData=TRUE)
percentVar_prefiltered_wgcna <- round(100 * attr(pcaData_prefiltered_wgcna, "percentVar"))
ggplot(pcaData_prefiltered_wgcna, aes(PC1, PC2, color=sample_name, shape=sample_name)) +
  geom_point(size=5) +
theme(panel.grid.major = element_line(colour = "gray97",  size = 1), panel.grid.minor = element_line(linetype = "dotted"), panel.background = element_rect(fill = NA), 
   legend.key = element_rect(fill = "gray100")) + theme(axis.line = element_line(size = 0.5,linetype = "solid")) + 
  theme(panel.border = element_rect(colour = "black", fill=NA, size=1)) +
scale_color_manual(values = c('#C24C3D','#8E3DC2','#E69F00'))+
  xlab(paste0("PC1: ",percentVar_raw_counts[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar_raw_counts[2],"% variance")) + 
  coord_fixed()

Differential gene expression

Differential gene expression analysis was carried out in DESeq2 (Love et al., 2014). For Skogsbergia sp. , we determined differentially upregulated genes in three tissue types - upper lip, compound eye and gut - using five biological replicates for each tissue. DESeq2 was employed using a p-value < 0.05 and FC > 1.5 for the significance of differentially expressed genes using the Benjamini-Hochberg method to account for false discovery rate (FDR). Pairwise comparisons were done across tissue types (i.e., upper lip to compound eye, upper lip to gut, gut to compound eye). To identify tissue-specific differential expression (i.e. significantly upregulated genes that are uniquely expressed), each tissue was compared to the other two. For example, the expression in the upper lip was determined by comparing it to both the compound eye and the gut tissues. For each pairwise comparison, the reference tissue was specified to determine the significantly upregulated genes in each tissue type (i.e., positive vs negative logfold change).

In [35]:
dds_count_table
class: DESeqDataSet 
dim: 81092 15 
metadata(1): version
assays(1): counts
rownames(81092): TRINITY_DN0_c0_g1_i2 TRINITY_DN0_c0_g3_i1 ...
  TRINITY_DN99_c4_g1_i1 TRINITY_DN9_c0_g1_i1
rowData names(0):
colnames(15): Sk.10A_fasta90_isoform.counts.tab
  Sk.10B_fasta90_isoform.counts.tab ...
  Sk.9B_fasta90_isoform..counts.tab Sk.9C_fasta90_isoform.counts.tab
colData names(2): sample_name names
In [36]:
#run DESeq2 analysis
dds_DE <- DESeq(dds_count_table)
estimating size factors

estimating dispersions

gene-wise dispersion estimates

mean-dispersion relationship

final dispersion estimates

fitting model and testing

DGE - Upper Lip vs Compound Eye

In [37]:
#set eye as a reference tissue 
dds_DE$sample_name <- relevel(dds_DE$sample_name, ref= "Eye")
In [38]:
#rerun DESeq command after reference is specified 
dds_DE <- DESeq(dds_DE)
using pre-existing size factors

estimating dispersions

found already estimated dispersions, replacing these

gene-wise dispersion estimates

mean-dispersion relationship

final dispersion estimates

fitting model and testing

In [39]:
#define contrasts, extract results table, and shrink the log2 fold changes

res_tableOE_unshrunken_UpperlipVs_Eye <- results(dds_DE, contrast= c("sample_name", "Upper_lip", "Eye") , alpha = 0.05)


res_tableOE_UpperlipVs_Eye <- lfcShrink(dds_DE, contrast= c("sample_name", "Upper_lip", "Eye"), res = res_tableOE_unshrunken_UpperlipVs_Eye)
using 'normal' for LFC shrinkage, the Normal prior from Love et al (2014).

Note that type='apeglm' and type='ashr' have shown to have less bias than type='normal'.
See ?lfcShrink for more details on shrinkage type, and the DESeq2 vignette.
Reference: https://doi.org/10.1093/bioinformatics/bty895

In [40]:
mcols(res_tableOE_UpperlipVs_Eye, use.names=T)
DataFrame with 6 rows and 2 columns
                       type
                <character>
baseMean       intermediate
log2FoldChange      results
lfcSE               results
stat                results
pvalue              results
padj                results
                                                        description
                                                        <character>
baseMean                  mean of normalized counts for all samples
log2FoldChange log2 fold change (MAP): sample_name Upper_lip vs Eye
lfcSE                  standard error: sample_name Upper_lip vs Eye
stat                   Wald statistic: sample name Upper lip vs Eye
pvalue              Wald test p-value: sample name Upper lip vs Eye
padj                                           BH adjusted p-values
In [41]:
# set thresholds
# lfc.cutoff value of 0.58 translates to a 1.5 log2 fold change 
# padj.cutoff value of 0.05 

padj.cutoff <- 0.05
lfc.cutoff <- 0.58
In [42]:
res_tableOE_UpperlipVs_Eye_tb <- res_tableOE_UpperlipVs_Eye %>%
  data.frame() %>%
  rownames_to_column(var="gene") %>% 
  as_tibble()
In [43]:
#determine all differentially expressed genes 
sigOE_UpperlipVs_Eye <- res_tableOE_UpperlipVs_Eye_tb %>%
        filter(padj < padj.cutoff & abs(log2FoldChange) > lfc.cutoff)
In [44]:
colnames(sigOE_UpperlipVs_Eye)[1]<- "transcript_id"
In [45]:
head(sigOE_UpperlipVs_Eye)
A tibble: 6 × 7
transcript_idbaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN100127_c0_g1_i1 3.608228 4.4801291.0997747 4.0134245.984419e-050.007626394
TRINITY_DN10153_c1_g1_i1 47.303025-1.9283670.5315456-3.6184802.963379e-040.023914234
TRINITY_DN1015_c0_g1_i4 23.418584-6.4019831.4252019-3.8506511.178042e-040.012554149
TRINITY_DN101713_c0_g1_i1 9.193390 3.7067620.8953110 3.9970826.412797e-050.008066841
TRINITY_DN10186_c1_g1_i1 4.266713-4.4891301.2628190-3.4769595.071345e-040.035493613
TRINITY_DN10190_c0_g1_i1 3.596375 3.4047441.0077576 3.4064406.581610e-040.042200953
In [46]:
#extract all genes that are significantly upregulated in the Upper Lip (positive log2 fold change)
Upperlip_Vs_eye_sigOE_UPREGULATED_logfold <- sigOE_UpperlipVs_Eye %>%
        filter(padj < padj.cutoff & log2FoldChange > lfc.cutoff)
In [47]:
colnames(Upperlip_Vs_eye_sigOE_UPREGULATED_logfold)[1]<- "transcript_id"
In [48]:
#add the annotation  
Upperlip_Vs_eye_sigOE_UPREGULATED_logfold_annot <- setDT(Trinotate_lym_subset_skogs, key = 'transcript_id')[J(Upperlip_Vs_eye_sigOE_UPREGULATED_logfold)]
In [3]:
#genes that are significantly upregulated in the Upper Lip (positive log2 fold change)
head(Upperlip_Vs_eye_sigOE_UPREGULATED_logfold_annot)
A data.table: 6 × 23
#gene_idtranscript_idsprot_Top_BLASTX_hitRNAMMERprot_idprot_coordssprot_Top_BLASTP_hitPfamSignalPTmHMMgene_ontology_BLASTPgene_ontology_PfamtranscriptpeptidebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN100127_c0_g1TRINITY_DN100127_c0_g1_i1. .TRINITY_DN100127_c0_g1_i1.p17-969[-] . . sigP:1^18^0.696.. . .. 3.6082284.4801291.09977474.0134245.984419e-050.007626394
TRINITY_DN101713_c0_g1TRINITY_DN101713_c0_g1_i1CIB3_HUMAN^CIB3_HUMAN^Q:51-617,H:1-187^51.323%ID^E:4.54e-65^RecName: Full=Calcium and integrin-binding family member 3;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo .TRINITY_DN101713_c0_g1_i1.p13-620[+] CIB3_HUMAN^CIB3_HUMAN^Q:17-205,H:1-187^51.323%ID^E:3.41e-65^RecName: Full=Calcium and integrin-binding family member 3;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo PF13499.9^EF-hand_7^EF-hand domain pair^127-191^E:4.1e-11 . .GO:0005509^molecular_function^calcium ion binding`GO:0000287^molecular_function^magnesium ion binding GO:0005509^molecular_function^calcium ion binding.. 9.1933903.7067620.89531103.9970826.412797e-050.008066841
TRINITY_DN10190_c0_g1 TRINITY_DN10190_c0_g1_i1 . .TRINITY_DN10190_c0_g1_i1.p2 3-323[-] . . . .. . .. 3.5963753.4047441.00775763.4064406.581610e-040.042200953
TRINITY_DN101_c0_g1 TRINITY_DN101_c0_g1_i6 . .. . . . . .. . ..2916.0398453.0001340.82034653.6194632.952156e-040.023914234
TRINITY_DN101_c0_g4 TRINITY_DN101_c0_g4_i1 . .. . . . . .. . .. 389.6503512.9992870.69841274.2696121.958133e-050.003016171
TRINITY_DN10265_c0_g2 TRINITY_DN10265_c0_g2_i1 OTUBL_DROME^OTUBL_DROME^Q:19-780,H:1-261^53.64%ID^E:1.15e-94^RecName: Full=Ubiquitin thioesterase otubain-like;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; Ephydroidea; Drosophilidae; Drosophila; Sophophora.TRINITY_DN10265_c0_g2_i1.p1 19-783[+]OTUBL_DROME^OTUBL_DROME^Q:1-254,H:1-261^53.64%ID^E:7.52e-99^RecName: Full=Ubiquitin thioesterase otubain-like;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; Ephydroidea; Drosophilidae; Drosophila; SophophoraPF10275.12^Peptidase_C65^Peptidase C65 Otubain^27-254^E:1.3e-77. .GO:0005634^cellular_component^nucleus`GO:0004843^molecular_function^cysteine-type deubiquitinase activity`GO:0043130^molecular_function^ubiquitin binding`GO:0071108^biological_process^protein K48-linked deubiquitination. .. 3.8833604.4428630.97654214.5370305.705196e-060.001111423
In [50]:
#extract all genes that are significantly upregulated in the Compound Eye (negative log2fold change) but that are downregulated in the upper lip.

Upperlip_Vs_eye_sigOE_DOWNREGULATED_logfold <- sigOE_UpperlipVs_Eye %>%
        filter(padj < padj.cutoff & log2FoldChange < lfc.cutoff)
In [2]:
#save these two dataframes for downstream analysis in Section 4.4 

#genes that are significantly upregulated in the Upper Lip (positive log2 fold change)
head(Upperlip_Vs_eye_sigOE_UPREGULATED_logfold)

#genes that significantly upregulated in the Compound Eye (negative log2fold change) but that are downregulated in the upper lip.
head(Upperlip_Vs_eye_sigOE_DOWNREGULATED_logfold)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN100127_c0_g1_i1 3.6082284.4801291.09977474.0134245.984419e-050.007626394
TRINITY_DN101713_c0_g1_i1 9.1933903.7067620.89531103.9970826.412797e-050.008066841
TRINITY_DN10190_c0_g1_i1 3.5963753.4047441.00775763.4064406.581610e-040.042200953
TRINITY_DN101_c0_g1_i6 2916.0398453.0001340.82034653.6194632.952156e-040.023914234
TRINITY_DN101_c0_g4_i1 389.6503512.9992870.69841274.2696121.958133e-050.003016171
TRINITY_DN10265_c0_g2_i1 3.8833604.4428630.97654214.5370305.705196e-060.001111423
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN10153_c1_g1_i1 47.303025-1.9283670.5315456-3.6184802.963379e-042.391423e-02
TRINITY_DN1015_c0_g1_i4 23.418584-6.4019831.4252019-3.8506511.178042e-041.255415e-02
TRINITY_DN10186_c1_g1_i1 4.266713-4.4891301.2628190-3.4769595.071345e-043.549361e-02
TRINITY_DN10299_c1_g1_i1 8.938083-3.5469560.8856172-3.8555511.154693e-041.234835e-02
TRINITY_DN103043_c0_g1_i173.858175-6.8231590.7991789-8.1189364.702892e-168.461056e-13
TRINITY_DN1040_c0_g1_i5 61.368383-5.2458970.6502786-7.9434741.965959e-153.340493e-12

DGE - Gut vs Compound Eye

In [52]:
#define contrasts, extract results table, and shrink the log2 fold changes

res_tableOE_unshrunken_GutVsEye <- results(dds_DE, contrast= c("sample_name", "Gut", "Eye") , alpha = 0.05)


res_tableOE_GutVsEye <- lfcShrink(dds_DE, contrast= c("sample_name", "Gut", "Eye"), res = res_tableOE_unshrunken_GutVsEye)
using 'normal' for LFC shrinkage, the Normal prior from Love et al (2014).

Note that type='apeglm' and type='ashr' have shown to have less bias than type='normal'.
See ?lfcShrink for more details on shrinkage type, and the DESeq2 vignette.
Reference: https://doi.org/10.1093/bioinformatics/bty895

In [53]:
mcols(res_tableOE_GutVsEye, use.names=T)
DataFrame with 6 rows and 2 columns
                       type                                    description
                <character>                                    <character>
baseMean       intermediate      mean of normalized counts for all samples
log2FoldChange      results log2 fold change (MAP): sample_name Gut vs Eye
lfcSE               results         standard error: sample_name Gut vs Eye
stat                results         Wald statistic: sample name Gut vs Eye
pvalue              results      Wald test p-value: sample name Gut vs Eye
padj                results                           BH adjusted p-values
In [54]:
res_tableOE_tb_GutVsEye <- res_tableOE_GutVsEye %>%
  data.frame() %>%
  rownames_to_column(var="gene") %>% 
  as_tibble()
In [55]:
#determine all differentially expressed genes 
sigOE_GutVsEye <- res_tableOE_tb_GutVsEye %>%
        filter(padj < padj.cutoff & abs(log2FoldChange) > lfc.cutoff)
In [56]:
colnames(sigOE_GutVsEye)[1]<- "transcript_id"
In [57]:
#extract all genes that are significantly upregulated in the gut (positive log2 fold change)
GutVsEye_sigOE_UPREGULATED_logfold <- sigOE_GutVsEye %>%
        filter(padj < padj.cutoff & log2FoldChange > lfc.cutoff)
In [58]:
#match the nodes 
GutVsEye_sigOE_UPREGULATED_logfold_annot <- setDT(Trinotate_lym_subset_skogs, key = 'transcript_id')[J(GutVsEye_sigOE_UPREGULATED_logfold)]
In [4]:
#genes that are significantly upregulated in the gut (positive log2 fold change)
head(GutVsEye_sigOE_UPREGULATED_logfold_annot)
A data.table: 6 × 23
#gene_idtranscript_idsprot_Top_BLASTX_hitRNAMMERprot_idprot_coordssprot_Top_BLASTP_hitPfamSignalPTmHMMgene_ontology_BLASTPgene_ontology_PfamtranscriptpeptidebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN0_c0_g1 TRINITY_DN0_c0_g1_i2 CPVL_MOUSE^CPVL_MOUSE^Q:2582-1206,H:3-471^44.186%ID^E:2.53e-133^RecName: Full=Probable serine carboxypeptidase CPVL;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus .TRINITY_DN0_c0_g1_i2.p1 1194-2573[-]CPVL_MOUSE^CPVL_MOUSE^Q:2-456,H:7-471^44.35%ID^E:3.55e-138^RecName: Full=Probable serine carboxypeptidase CPVL;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus PF00450.25^Peptidase_S10^Serine carboxypeptidase^67-454^E:2.7e-98sigP:1^17^0.818.GO:0004185^molecular_function^serine-type carboxypeptidase activity GO:0004185^molecular_function^serine-type carboxypeptidase activity`GO:0006508^biological_process^proteolysis.. 87.7281344.1161020.75983925.3798697.454017e-084.057540e-06
TRINITY_DN0_c0_g4 TRINITY_DN0_c0_g4_i2 VCP_APIME^VCP_APIME^Q:1935-634,H:24-463^44.743%ID^E:1.97e-118^RecName: Full=Venom serine carboxypeptidase;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Endopterygota; Hymenoptera; Apocrita; Aculeata; Apoidea; Apidae; Apis .TRINITY_DN0_c0_g4_i2.p1 619-2025[-] VCP_APIME^VCP_APIME^Q:6-463,H:3-462^43.856%ID^E:1.3e-122^RecName: Full=Venom serine carboxypeptidase;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Endopterygota; Hymenoptera; Apocrita; Aculeata; Apoidea; Apidae; Apis PF00450.25^Peptidase_S10^Serine carboxypeptidase^75-465^E:5.3e-92sigP:1^19^0.803.GO:0005576^cellular_component^extracellular region`GO:0004185^molecular_function^serine-type carboxypeptidase activity GO:0004185^molecular_function^serine-type carboxypeptidase activity`GO:0006508^biological_process^proteolysis..223.7656622.7134500.82052883.2924729.931093e-041.222380e-02
TRINITY_DN10004_c4_g1 TRINITY_DN10004_c4_g1_i1 . .. . . . . .. . .. 1.7990573.4721971.04587783.2993939.689408e-041.200249e-02
TRINITY_DN10008_c0_g1 TRINITY_DN10008_c0_g1_i1 ENDUB_DANRE^ENDUB_DANRE^Q:1009-266,H:7-282^28.261%ID^E:1.58e-20^RecName: Full=Uridylate-specific endoribonuclease B {ECO:0000305};^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Actinopterygii; Neopterygii; Teleostei; Ostariophysi; Cypriniformes; Danionidae; Danioninae; Danio.TRINITY_DN10008_c0_g1_i1.p1179-1096[-] ENDUB_DANRE^ENDUB_DANRE^Q:24-277,H:1-282^28.723%ID^E:1.04e-24^RecName: Full=Uridylate-specific endoribonuclease B {ECO:0000305};^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Actinopterygii; Neopterygii; Teleostei; Ostariophysi; Cypriniformes; Danionidae; Danioninae; DanioPF09412.13^XendoU^Endoribonuclease XendoU^31-274^E:6.7e-45 sigP:1^21^0.779.GO:0004521^molecular_function^endoribonuclease activity`GO:0016829^molecular_function^lyase activity`GO:0046872^molecular_function^metal ion binding`GO:0003723^molecular_function^RNA bindingGO:0004521^molecular_function^endoribonuclease activity .. 8.6055744.9269321.25191273.9178178.935435e-051.798068e-03
TRINITY_DN100098_c0_g1TRINITY_DN100098_c0_g1_i1. .. . . . . .. . .. 17.8356827.0633790.92348007.3637761.787794e-133.867227e-11
TRINITY_DN1000_c1_g1 TRINITY_DN1000_c1_g1_i2 . .. . . . . .. . ..134.1704101.3700240.38040163.6002183.179503e-044.979644e-03
In [61]:
#extract all genes that are significantly upregulated in the Compound Eye (negative log2fold change) but that are downregulated in the gut.  

#extract all genes that are significantly upregulated in the Compound Eye
GutVsEye_sigOE_DOWNREGULATED_logfold <- sigOE_GutVsEye %>%
        filter(padj < padj.cutoff & log2FoldChange < lfc.cutoff)
In [3]:
#save these two dataframes for downstream analysis

#genes that are significantly upregulated in the Gut (positive log2 fold change)
head(GutVsEye_sigOE_UPREGULATED_logfold)

#genes that are significantly upregulated in the Compound Eye ((negative log2fold change) but that are downregulated in the gut 
head(GutVsEye_sigOE_DOWNREGULATED_logfold)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN0_c0_g1_i2 87.7281344.1161020.75983925.3798697.454017e-084.057540e-06
TRINITY_DN0_c0_g4_i2 223.7656622.7134500.82052883.2924729.931093e-041.222380e-02
TRINITY_DN10004_c4_g1_i1 1.7990573.4721971.04587783.2993939.689408e-041.200249e-02
TRINITY_DN10008_c0_g1_i1 8.6055744.9269321.25191273.9178178.935435e-051.798068e-03
TRINITY_DN100098_c0_g1_i1 17.8356827.0633790.92348007.3637761.787794e-133.867227e-11
TRINITY_DN1000_c1_g1_i2 134.1704101.3700240.38040163.6002183.179503e-044.979644e-03
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN10002_c1_g1_i1 2.520034-3.3738721.1281647-2.8969793.767749e-033.257075e-02
TRINITY_DN10016_c3_g1_i2 46.654265-2.0215070.5571204-3.6281272.854847e-044.588174e-03
TRINITY_DN10036_c0_g1_i1 12.458310-2.0761700.6114870-3.3909286.965633e-049.174321e-03
TRINITY_DN10044_c1_g1_i1 2.893933-3.5092211.0126332-3.3160959.128492e-041.143451e-02
TRINITY_DN1004_c0_g1_i5 793.893694-3.9757480.7871186-5.0736443.902699e-071.732798e-05
TRINITY_DN10062_c0_g1_i1 9.145714-3.7621791.0732028-3.4712065.181254e-047.298007e-03

DGE - Upper Lip vs Gut

In [63]:
#now set gut as a reference tissue 
dds_DE$sample_name <- relevel(dds_DE$sample_name, ref= "Gut")
In [64]:
#rerun DESeq 
dds_DE <- DESeq(dds_DE)
using pre-existing size factors

estimating dispersions

found already estimated dispersions, replacing these

gene-wise dispersion estimates

mean-dispersion relationship

final dispersion estimates

fitting model and testing

In [65]:
#define contrasts, extract results table, and shrink the log2 fold changes

res_tableOE_unshrunken_UpperLipVsGut <- results(dds_DE, contrast= c("sample_name", "Upper_lip", "Gut") , alpha = 0.05)


res_tableOE_UpperLipVsGut <- lfcShrink(dds_DE, contrast= c("sample_name", "Upper_lip", "Gut"), res = res_tableOE_unshrunken_UpperLipVsGut)
using 'normal' for LFC shrinkage, the Normal prior from Love et al (2014).

Note that type='apeglm' and type='ashr' have shown to have less bias than type='normal'.
See ?lfcShrink for more details on shrinkage type, and the DESeq2 vignette.
Reference: https://doi.org/10.1093/bioinformatics/bty895

In [66]:
mcols(res_tableOE_UpperLipVsGut, use.names=T)
DataFrame with 6 rows and 2 columns
                       type
                <character>
baseMean       intermediate
log2FoldChange      results
lfcSE               results
stat                results
pvalue              results
padj                results
                                                        description
                                                        <character>
baseMean                  mean of normalized counts for all samples
log2FoldChange log2 fold change (MAP): sample_name Upper_lip vs Gut
lfcSE                  standard error: sample_name Upper_lip vs Gut
stat                   Wald statistic: sample name Upper lip vs Gut
pvalue              Wald test p-value: sample name Upper lip vs Gut
padj                                           BH adjusted p-values
In [67]:
res_tableOE_tb_UpperLipVsGut <- res_tableOE_UpperLipVsGut %>%
  data.frame() %>%
  rownames_to_column(var="gene") %>% 
  as_tibble()
In [68]:
#determine all differentially expressed genes 
sigOE_UpperLipVsGut <- res_tableOE_tb_UpperLipVsGut %>%
        filter(padj < padj.cutoff & abs(log2FoldChange) > lfc.cutoff)
In [69]:
#extract all genes that are significantly upregulated in the upper lip (positive log2 fold change)
UpperLipVsGut_sigOE_UPREGULATED_logfold <- sigOE_UpperLipVsGut %>%
        filter(padj < padj.cutoff & log2FoldChange > lfc.cutoff)
In [5]:
#genes that are significantly upregulated in the upper lip (positive log2 fold change)
head(UpperLipVsGut_sigOE_UPREGULATED_logfold )
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN100127_c0_g1_i1 3.6082282.7293120.93417222.8803013.972962e-033.335116e-02
TRINITY_DN10016_c3_g1_i2 46.6542652.2776620.55515944.1020074.095821e-058.875274e-04
TRINITY_DN10031_c3_g1_i1 16.5440371.8336040.36990024.9545387.250222e-073.064050e-05
TRINITY_DN100493_c0_g1_i1 4.2492533.5714581.14549133.0469272.311939e-032.223371e-02
TRINITY_DN1004_c0_g1_i5 793.8936945.1949430.78684776.6143983.730678e-115.611549e-09
TRINITY_DN10062_c0_g1_i1 9.1457143.5319211.07377703.2682081.082309e-031.250321e-02
In [71]:
#extract all genes that are significantly upregulated in the Gut (negative log2fold change)
UpperLipVsGut_sigOE_DOWNREGULATED_logfold <- sigOE_UpperLipVsGut %>%
        filter(padj < padj.cutoff & log2FoldChange < lfc.cutoff)
In [6]:
#extract all genes that are significantly upregulated in the Gut (negative log2fold change) but that are downregulated in the Upper Lip.  
head(UpperLipVsGut_sigOE_DOWNREGULATED_logfold)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN0_c0_g1_i2 87.72813-4.7596780.7690108-6.1373138.392890e-108.887781e-08
TRINITY_DN0_c0_g3_i1 20.64123-2.2956630.8282384-2.7584285.808017e-034.411316e-02
TRINITY_DN0_c0_g4_i2 223.76566-3.9434880.8241288-4.7748251.798639e-066.543283e-05
TRINITY_DN100098_c0_g1_i1 17.83568-6.2745310.8954141-6.4896948.601097e-111.178318e-08
TRINITY_DN1000_c1_g1_i2 134.17041-1.5204510.3803992-3.9958546.446143e-051.293863e-03
TRINITY_DN1000_c2_g1_i2 13.27825-1.8062580.5411564-3.3247858.848688e-041.073042e-02
In [4]:
#save these two dataframes for downstream analysis

#genes that are significantly upregulated in the upper lip (positive log2 fold change)
head(UpperLipVsGut_sigOE_UPREGULATED_logfold)

#genes that are significantly upregulated in the gut (negative log2fold change) but that are downregulated in the Upper Lip  
head(UpperLipVsGut_sigOE_DOWNREGULATED_logfold)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN100127_c0_g1_i1 3.6082282.7293120.93417222.8803013.972962e-033.335116e-02
TRINITY_DN10016_c3_g1_i2 46.6542652.2776620.55515944.1020074.095821e-058.875274e-04
TRINITY_DN10031_c3_g1_i1 16.5440371.8336040.36990024.9545387.250222e-073.064050e-05
TRINITY_DN100493_c0_g1_i1 4.2492533.5714581.14549133.0469272.311939e-032.223371e-02
TRINITY_DN1004_c0_g1_i5 793.8936945.1949430.78684776.6143983.730678e-115.611549e-09
TRINITY_DN10062_c0_g1_i1 9.1457143.5319211.07377703.2682081.082309e-031.250321e-02
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN0_c0_g1_i2 87.72813-4.7596780.7690108-6.1373138.392890e-108.887781e-08
TRINITY_DN0_c0_g3_i1 20.64123-2.2956630.8282384-2.7584285.808017e-034.411316e-02
TRINITY_DN0_c0_g4_i2 223.76566-3.9434880.8241288-4.7748251.798639e-066.543283e-05
TRINITY_DN100098_c0_g1_i1 17.83568-6.2745310.8954141-6.4896948.601097e-111.178318e-08
TRINITY_DN1000_c1_g1_i2 134.17041-1.5204510.3803992-3.9958546.446143e-051.293863e-03
TRINITY_DN1000_c2_g1_i2 13.27825-1.8062580.5411564-3.3247858.848688e-041.073042e-02

Determine tissue-specific expression

To identify tissue-specific differential expression (i.e., significantly upregulated genes that are uniquely expressed), each tissue was compared to the other two and tissue-specific genes were extracted from the intersection of the Venn diagram (in Section 4.4.4). For example, the expression in the upper lip was determined by comparing it to both the compound eye and gut tissues.

Upper lip

In [7]:
#create a dataframe with all significantly upregulated genes of the upper lip
#merge dataframes that have significantly upregulated genes of the  upper lip from pairwise comparisons - Upper Lip vs Gut and Upper Lip vs Eye

head(UpperLipVsGut_sigOE_UPREGULATED_logfold)

head(Upperlip_Vs_eye_sigOE_UPREGULATED_logfold)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN100127_c0_g1_i1 3.6082282.7293120.93417222.8803013.972962e-033.335116e-02
TRINITY_DN10016_c3_g1_i2 46.6542652.2776620.55515944.1020074.095821e-058.875274e-04
TRINITY_DN10031_c3_g1_i1 16.5440371.8336040.36990024.9545387.250222e-073.064050e-05
TRINITY_DN100493_c0_g1_i1 4.2492533.5714581.14549133.0469272.311939e-032.223371e-02
TRINITY_DN1004_c0_g1_i5 793.8936945.1949430.78684776.6143983.730678e-115.611549e-09
TRINITY_DN10062_c0_g1_i1 9.1457143.5319211.07377703.2682081.082309e-031.250321e-02
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN100127_c0_g1_i1 3.6082284.4801291.09977474.0134245.984419e-050.007626394
TRINITY_DN101713_c0_g1_i1 9.1933903.7067620.89531103.9970826.412797e-050.008066841
TRINITY_DN10190_c0_g1_i1 3.5963753.4047441.00775763.4064406.581610e-040.042200953
TRINITY_DN101_c0_g1_i6 2916.0398453.0001340.82034653.6194632.952156e-040.023914234
TRINITY_DN101_c0_g4_i1 389.6503512.9992870.69841274.2696121.958133e-050.003016171
TRINITY_DN10265_c0_g2_i1 3.8833604.4428630.97654214.5370305.705196e-060.001111423
In [77]:
colnames(UpperLipVsGut_sigOE_UPREGULATED_logfold)[1]<- "gene"
In [78]:
colnames(Upperlip_Vs_eye_sigOE_UPREGULATED_logfold)[1]<- "gene"
In [79]:
#merge
merged_upper_lips_df <- rbind(
  UpperLipVsGut_sigOE_UPREGULATED_logfold,
  Upperlip_Vs_eye_sigOE_UPREGULATED_logfold
)
In [80]:
head(merged_upper_lips_df)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN100127_c0_g1_i1 3.6082282.7293120.93417222.8803013.972962e-033.335116e-02
TRINITY_DN10016_c3_g1_i2 46.6542652.2776620.55515944.1020074.095821e-058.875274e-04
TRINITY_DN10031_c3_g1_i1 16.5440371.8336040.36990024.9545387.250222e-073.064050e-05
TRINITY_DN100493_c0_g1_i1 4.2492533.5714581.14549133.0469272.311939e-032.223371e-02
TRINITY_DN1004_c0_g1_i5 793.8936945.1949430.78684776.6143983.730678e-115.611549e-09
TRINITY_DN10062_c0_g1_i1 9.1457143.5319211.07377703.2682081.082309e-031.250321e-02
In [81]:
#the same gene can be found in both Upper Lip vs Gut and Upper Lip vs Eye. Remove gene duplicates while retaining one duplicate. 
merged_upper_lips_unique <- merged_upper_lips_df[!duplicated(merged_upper_lips_df$gene), ]
In [82]:
head(merged_upper_lips_unique)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN100127_c0_g1_i1 3.6082282.7293120.93417222.8803013.972962e-033.335116e-02
TRINITY_DN10016_c3_g1_i2 46.6542652.2776620.55515944.1020074.095821e-058.875274e-04
TRINITY_DN10031_c3_g1_i1 16.5440371.8336040.36990024.9545387.250222e-073.064050e-05
TRINITY_DN100493_c0_g1_i1 4.2492533.5714581.14549133.0469272.311939e-032.223371e-02
TRINITY_DN1004_c0_g1_i5 793.8936945.1949430.78684776.6143983.730678e-115.611549e-09
TRINITY_DN10062_c0_g1_i1 9.1457143.5319211.07377703.2682081.082309e-031.250321e-02

Compound eye

In [8]:
#create a dataframe with all significantly upregulated genes of the compound eye 
#merge dataframes that have significantly upregulated genes of the compound eye from  pairwise comparisons - Upper Lip vs Compound Eye and Gut vs Compound Eye 

#genes that significantly upregulated in the Compound Eye
head(Upperlip_Vs_eye_sigOE_DOWNREGULATED_logfold)

#genes that are significantly upregulated in the Compound Eye
head(GutVsEye_sigOE_DOWNREGULATED_logfold)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN10153_c1_g1_i1 47.303025-1.9283670.5315456-3.6184802.963379e-042.391423e-02
TRINITY_DN1015_c0_g1_i4 23.418584-6.4019831.4252019-3.8506511.178042e-041.255415e-02
TRINITY_DN10186_c1_g1_i1 4.266713-4.4891301.2628190-3.4769595.071345e-043.549361e-02
TRINITY_DN10299_c1_g1_i1 8.938083-3.5469560.8856172-3.8555511.154693e-041.234835e-02
TRINITY_DN103043_c0_g1_i173.858175-6.8231590.7991789-8.1189364.702892e-168.461056e-13
TRINITY_DN1040_c0_g1_i5 61.368383-5.2458970.6502786-7.9434741.965959e-153.340493e-12
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN10002_c1_g1_i1 2.520034-3.3738721.1281647-2.8969793.767749e-033.257075e-02
TRINITY_DN10016_c3_g1_i2 46.654265-2.0215070.5571204-3.6281272.854847e-044.588174e-03
TRINITY_DN10036_c0_g1_i1 12.458310-2.0761700.6114870-3.3909286.965633e-049.174321e-03
TRINITY_DN10044_c1_g1_i1 2.893933-3.5092211.0126332-3.3160959.128492e-041.143451e-02
TRINITY_DN1004_c0_g1_i5 793.893694-3.9757480.7871186-5.0736443.902699e-071.732798e-05
TRINITY_DN10062_c0_g1_i1 9.145714-3.7621791.0732028-3.4712065.181254e-047.298007e-03
In [84]:
colnames(Upperlip_Vs_eye_sigOE_DOWNREGULATED_logfold)[1]<- "gene"
In [85]:
colnames(GutVsEye_sigOE_DOWNREGULATED_logfold)[1]<- "gene"
In [86]:
#merged eye
merged_Eye_df <- rbind(Upperlip_Vs_eye_sigOE_DOWNREGULATED_logfold , GutVsEye_sigOE_DOWNREGULATED_logfold)
In [87]:
head(merged_Eye_df)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN10153_c1_g1_i1 47.303025-1.9283670.5315456-3.6184802.963379e-042.391423e-02
TRINITY_DN1015_c0_g1_i4 23.418584-6.4019831.4252019-3.8506511.178042e-041.255415e-02
TRINITY_DN10186_c1_g1_i1 4.266713-4.4891301.2628190-3.4769595.071345e-043.549361e-02
TRINITY_DN10299_c1_g1_i1 8.938083-3.5469560.8856172-3.8555511.154693e-041.234835e-02
TRINITY_DN103043_c0_g1_i173.858175-6.8231590.7991789-8.1189364.702892e-168.461056e-13
TRINITY_DN1040_c0_g1_i5 61.368383-5.2458970.6502786-7.9434741.965959e-153.340493e-12
In [88]:
#the same gene can be found in both Upper Lip vs Compound Eye and Gut vs Compound Eye. Remove gene duplicates while retaining one duplicate. 
merged_Eye_unique <-merged_Eye_df[!duplicated(merged_Eye_df$gene), ]
In [89]:
head(merged_Eye_unique)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN10153_c1_g1_i1 47.303025-1.9283670.5315456-3.6184802.963379e-042.391423e-02
TRINITY_DN1015_c0_g1_i4 23.418584-6.4019831.4252019-3.8506511.178042e-041.255415e-02
TRINITY_DN10186_c1_g1_i1 4.266713-4.4891301.2628190-3.4769595.071345e-043.549361e-02
TRINITY_DN10299_c1_g1_i1 8.938083-3.5469560.8856172-3.8555511.154693e-041.234835e-02
TRINITY_DN103043_c0_g1_i173.858175-6.8231590.7991789-8.1189364.702892e-168.461056e-13
TRINITY_DN1040_c0_g1_i5 61.368383-5.2458970.6502786-7.9434741.965959e-153.340493e-12

Gut

In [9]:
#create a dataframe with all significantly upregulated genes of the gut  
#merge dataframes that have significantly upregulated genes of the gut from pairwise comparisons - Gut vs Eye and Upper lip vs Gut

#genes that are significantly upregulated in the Gut
head(GutVsEye_sigOE_UPREGULATED_logfold)

#genes that are significantly upregulated in the gut 
head(UpperLipVsGut_sigOE_DOWNREGULATED_logfold)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN0_c0_g1_i2 87.7281344.1161020.75983925.3798697.454017e-084.057540e-06
TRINITY_DN0_c0_g4_i2 223.7656622.7134500.82052883.2924729.931093e-041.222380e-02
TRINITY_DN10004_c4_g1_i1 1.7990573.4721971.04587783.2993939.689408e-041.200249e-02
TRINITY_DN10008_c0_g1_i1 8.6055744.9269321.25191273.9178178.935435e-051.798068e-03
TRINITY_DN100098_c0_g1_i1 17.8356827.0633790.92348007.3637761.787794e-133.867227e-11
TRINITY_DN1000_c1_g1_i2 134.1704101.3700240.38040163.6002183.179503e-044.979644e-03
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN0_c0_g1_i2 87.72813-4.7596780.7690108-6.1373138.392890e-108.887781e-08
TRINITY_DN0_c0_g3_i1 20.64123-2.2956630.8282384-2.7584285.808017e-034.411316e-02
TRINITY_DN0_c0_g4_i2 223.76566-3.9434880.8241288-4.7748251.798639e-066.543283e-05
TRINITY_DN100098_c0_g1_i1 17.83568-6.2745310.8954141-6.4896948.601097e-111.178318e-08
TRINITY_DN1000_c1_g1_i2 134.17041-1.5204510.3803992-3.9958546.446143e-051.293863e-03
TRINITY_DN1000_c2_g1_i2 13.27825-1.8062580.5411564-3.3247858.848688e-041.073042e-02
In [91]:
colnames(GutVsEye_sigOE_UPREGULATED_logfold)[1]<- "gene"
In [92]:
colnames(UpperLipVsGut_sigOE_DOWNREGULATED_logfold)[1]<- "gene"
In [93]:
#merge
merged_Gut_df <- rbind(GutVsEye_sigOE_UPREGULATED_logfold , UpperLipVsGut_sigOE_DOWNREGULATED_logfold)
In [94]:
head(merged_Gut_df)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN0_c0_g1_i2 87.7281344.1161020.75983925.3798697.454017e-084.057540e-06
TRINITY_DN0_c0_g4_i2 223.7656622.7134500.82052883.2924729.931093e-041.222380e-02
TRINITY_DN10004_c4_g1_i1 1.7990573.4721971.04587783.2993939.689408e-041.200249e-02
TRINITY_DN10008_c0_g1_i1 8.6055744.9269321.25191273.9178178.935435e-051.798068e-03
TRINITY_DN100098_c0_g1_i1 17.8356827.0633790.92348007.3637761.787794e-133.867227e-11
TRINITY_DN1000_c1_g1_i2 134.1704101.3700240.38040163.6002183.179503e-044.979644e-03
In [95]:
#the same gene can be found in both Gut vs Eye and Upper lip vs Gut. Remove gene duplicates while retaining one duplicate.
merged_Gut_unique <-merged_Gut_df[!duplicated(merged_Gut_df$gene), ]
In [96]:
head(merged_Gut_unique)
A tibble: 6 × 7
genebaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN0_c0_g1_i2 87.7281344.1161020.75983925.3798697.454017e-084.057540e-06
TRINITY_DN0_c0_g4_i2 223.7656622.7134500.82052883.2924729.931093e-041.222380e-02
TRINITY_DN10004_c4_g1_i1 1.7990573.4721971.04587783.2993939.689408e-041.200249e-02
TRINITY_DN10008_c0_g1_i1 8.6055744.9269321.25191273.9178178.935435e-051.798068e-03
TRINITY_DN100098_c0_g1_i1 17.8356827.0633790.92348007.3637761.787794e-133.867227e-11
TRINITY_DN1000_c1_g1_i2 134.1704101.3700240.38040163.6002183.179503e-044.979644e-03

Venn Diagram - Extract tissue-specific genes

In [99]:
#generate a venn diagram to visualize shared significantly upregulated genes across tissue types and extract the genes that are unique to each tissue type. 
unique_venn_list <- list(
  Upper_Lip = merged_upper_lips_unique$gene  , 
  Gut = merged_Gut_unique$gene,
  Compound_Eye = merged_Eye_unique$gene
)

ggvenn_unique <- ggvenn(
  unique_venn_list, 
  fill_color = c('#F1AFB4','#C97D97', '#F2C93D'),
  stroke_size = .7, set_name_size = 6, text_size = 5
)

ggvenn_unique
In [2]:
# Open a PDF device
pdf("Non_Luminous_DGE.pdf", width = 8, height = 6)


ggvenn_unique 

dev.off()
In [100]:
#prep dataframes for extraction
Upper_Lip <- as.data.frame(merged_upper_lips_unique$gene)
colnames(Upper_Lip)[1]<- "gene"
Gut <- as.data.frame(merged_Gut_unique$gene)
colnames(Gut)[1]<- "gene"
Compound_eye <- as.data.frame(merged_Eye_unique$gene)
colnames(Compound_eye)[1]<- "gene"
In [101]:
# compare and extract unique genes for each tissue type
unique_genes_upper_lip <- anti_join(Upper_Lip, Gut, by = "gene") %>%
  anti_join(Compound_eye, by = "gene")

unique_genes_gut <- anti_join(Gut, Upper_Lip, by = "gene") %>%
  anti_join(Compound_eye, by = "gene")

unique_genes_compound_eye <- anti_join(Compound_eye, Upper_Lip, by = "gene") %>%
  anti_join(Gut, by = "gene")
In [102]:
nrow(unique_genes_upper_lip)
934
In [103]:
nrow(unique_genes_gut)
4266
In [104]:
nrow(unique_genes_compound_eye)
800
In [105]:
#add the annotations back to the unique genes in each tissue type by subsetting
In [106]:
unique_genes_upper_lip_info  <- merged_upper_lips_unique %>%
  filter(gene %in% unique_genes_upper_lip$gene)
In [107]:
nrow(unique_genes_upper_lip_info)
934
In [108]:
unique_genes_eye_info  <- merged_Eye_unique %>%
  filter(gene %in% unique_genes_compound_eye$gene)
In [109]:
nrow(unique_genes_eye_info)
800
In [110]:
unique_genes_gut_info  <- merged_Gut_unique %>%
  filter(gene %in% unique_genes_gut$gene)
In [111]:
nrow(unique_genes_gut_info)
4266
In [112]:
#add annotations back for upper lip
colnames(unique_genes_upper_lip_info)[1]<- "transcript_id"
unique_genes_upper_lip_info_annot <- left_join(unique_genes_upper_lip_info,Trinotate_lym_subset_skogs,by="transcript_id")
In [271]:
#write.csv(unique_genes_upper_lip_info_annot, file = "df_Skogs_sigfig_upreg_unique_Upper_Lip.csv")
In [114]:
#add annotations back for eyes 
colnames(unique_genes_eye_info)[1]<- "transcript_id"
unique_genes_eye_info_annot <- left_join(unique_genes_eye_info,Trinotate_lym_subset_skogs,by="transcript_id")
In [272]:
#write.csv(unique_genes_eye_info_annot, file = "df_Skogs_sigfig_upreg_unique_comEye.csv")
In [115]:
#add annotations back for gut 
colnames(unique_genes_gut_info)[1] <- "transcript_id"
unique_genes_gut_info_annot <- left_join(unique_genes_gut_info,Trinotate_lym_subset_skogs,by="transcript_id")
In [228]:
#write.csv(unique_genes_gut_info_annot, file = "df_Skogs_sigfig_upreg_unique_Gut.csv")

Identify the number of shared significantly upregulated genes between tissue types across V.tsujii and Skogsbergia sp.

Compared the number of significantly upregulated genes (expressed uniquely) between upper lips and compound eyes of luminous and non-luminous ostracods.

Determine the number of shared significantly upregulated genes between compound eyes

Import

In [116]:
#import V.tsujii compound eyes 
vtsujii_eye <- read.csv("df_Vtsujii_sigfig_upreg_unique_comEye.csv", header = TRUE, row.names=1,
                  stringsAsFactors = FALSE)
In [117]:
#change column name 
colnames(vtsujii_eye)[1]<- "gene"
In [118]:
vargulatsujii_v_skogsbergia_orthologs_factor <- read.csv("Vargula_tsujii_cdhit_95.fasta.transdecoder__v__Skogsbergia_sp.csv")
In [119]:
head(vargulatsujii_v_skogsbergia_orthologs_factor)
A data.frame: 6 × 3
OrthogroupVargula_tsujii_cdhit_95.fasta.transdecoderSkogsbergia_sp
<chr><chr><chr>
1OG0000000NODE_10135_length_1998_cov_0.0936696_g1431_i1.p1, NODE_29_length_9922_cov_21.9983_g21_i0.p1, NODE_51985_length_298_cov_2.3786_g44765_i0.p1, NODE_1757_length_4178_cov_0.681057_g1230_i0.p1, NODE_5045_length_2903_cov_0.0983146_g3589_i0.p1, NODE_6915_length_2515_cov_26.5618_g4907_i0.p1 TRINITY_DN104_c0_g1_i13.p1, TRINITY_DN48525_c0_g1_i2.p1, TRINITY_DN104_c2_g4_i1.p1, TRINITY_DN11374_c0_g1_i5.p2, TRINITY_DN11454_c1_g1_i4.p1, TRINITY_DN36329_c1_g1_i1.p1, TRINITY_DN12263_c0_g1_i1.p1, TRINITY_DN18470_c0_g2_i1.p1, TRINITY_DN4890_c1_g1_i6.p1, TRINITY_DN1412_c0_g3_i10.p4, TRINITY_DN1412_c0_g4_i2.p1, TRINITY_DN4561_c0_g1_i12.p1, TRINITY_DN104_c2_g2_i3.p1, TRINITY_DN6718_c0_g3_i6.p1, TRINITY_DN67800_c0_g1_i1.p1, TRINITY_DN11716_c0_g2_i1.p1, TRINITY_DN25498_c0_g1_i1.p1, TRINITY_DN15017_c0_g1_i1.p1, TRINITY_DN19426_c0_g1_i2.p1, TRINITY_DN11454_c0_g2_i2.p1, TRINITY_DN6718_c0_g1_i1.p1, TRINITY_DN10747_c0_g1_i7.p1, TRINITY_DN25693_c0_g1_i1.p1, TRINITY_DN35713_c0_g1_i1.p1, TRINITY_DN47373_c0_g1_i1.p1, TRINITY_DN57055_c0_g1_i1.p1, TRINITY_DN3755_c0_g1_i4.p1, TRINITY_DN36596_c0_g1_i1.p1, TRINITY_DN31005_c0_g1_i1.p1, TRINITY_DN36867_c0_g1_i1.p1
2OG0000000NODE_6981_length_2505_cov_1.1751_g4950_i0.p1, NODE_1965_length_4053_cov_7.00175_g1371_i0.p1, NODE_1695_length_4213_cov_7.70635_g1185_i0.p1, NODE_5896_length_2710_cov_1.26591_g4187_i0.p1, NODE_2126_length_3968_cov_23.5955_g1482_i0.p1, NODE_3679_length_3288_cov_7.39839_g2592_i0.p1, NODE_2677_length_3689_cov_3.17474_g1860_i0.p1, NODE_1433_length_4447_cov_35.7361_g986_i0.p1, NODE_2978_length_3542_cov_86.2509_g1026_i1.p1, NODE_1714_length_4204_cov_1.97132_g1201_i0.p1, NODE_2412_length_3813_cov_4.34646_g1680_i0.p1, NODE_1889_length_4101_cov_16.4234_g1317_i0.p1, NODE_4638_length_3008_cov_0_g3278_i0.p1, NODE_1946_length_4074_cov_0_g1356_i0.p1TRINITY_DN70787_c0_g1_i1.p1
3OG0000000NODE_1668_length_4233_cov_0.215414_g1162_i0.p1, NODE_46195_length_337_cov_2.2766_g38975_i0.p1, NODE_46862_length_332_cov_2.27798_g39642_i0.p1, NODE_28371_length_630_cov_3.02783_g21604_i0.p1, NODE_49746_length_312_cov_2.03113_g42526_i0.p1, NODE_4993_length_2913_cov_4.50245_g415_i6.p1, NODE_5572_length_2776_cov_6.68394_g3957_i0.p1, NODE_32876_length_511_cov_2.40132_g25833_i0.p2 TRINITY_DN26605_c0_g1_i1.p1, TRINITY_DN7221_c2_g2_i2.p1, TRINITY_DN27399_c0_g1_i1.p2, TRINITY_DN52051_c0_g1_i1.p1
4OG0000000NODE_420_length_6078_cov_26.8343_g280_i0.p1 TRINITY_DN3439_c0_g1_i9.p1, TRINITY_DN48719_c0_g1_i2.p1, TRINITY_DN8716_c0_g1_i6.p1, TRINITY_DN8716_c2_g1_i1.p1, TRINITY_DN3746_c1_g1_i10.p2, TRINITY_DN3746_c1_g1_i10.p1, TRINITY_DN43198_c0_g1_i1.p1
5OG0000000NODE_5291_length_2837_cov_5.55895_g3765_i0.p1 TRINITY_DN24813_c0_g1_i1.p1, TRINITY_DN13581_c0_g1_i4.p1, TRINITY_DN96293_c0_g1_i1.p1
6OG0000001NODE_38160_length_421_cov_1.40984_g30995_i0.p1, NODE_43900_length_357_cov_1.58278_g36681_i0.p1 TRINITY_DN49841_c0_g1_i2.p1, TRINITY_DN9841_c1_g1_i7.p1

Determine the orthogroups that contain V.tsujii significantly upregulated compound eye genes

In [120]:
#find the orthogroups that have V.tsujii DE compound eye transcripts 
vtsujii_DE_eye <- subset(vtsujii_eye, select = "gene")
In [121]:
head(vtsujii_DE_eye)
A data.frame: 6 × 1
gene
<chr>
1NODE_10126_length_1999_cov_2.60802_g7138_i0
2NODE_101633_length_138_cov_2.06024_g94413_i0
3NODE_10232_length_1983_cov_42.0814_g7210_i0
4NODE_102471_length_136_cov_2.41975_g95251_i0
5NODE_10387_length_1964_cov_13.2829_g7315_i0
6NODE_10411_length_1962_cov_1.81279_g354_i2
In [122]:
# run the function and change the column names accordingly

check_and_subset <- function(df1, df2) {
  matched_rows_list <- list()
  for (i in 1:nrow(df1)) {
    char_row <- df1[i, "gene"]
    matched_rows <- df2[str_detect(df2$Vargula_tsujii_cdhit_95.fasta.transdecoder, paste0(char_row, collapse = "|")), , drop = FALSE]
    if (nrow(matched_rows) > 0) {
      matched_rows_list[[i]] <- matched_rows
    }
  }
  if (length(matched_rows_list) > 0) {
    matched_df <- bind_rows(matched_rows_list)
    return(matched_df)
  } else {
    return(NULL)
  }
}
In [123]:
# Use the function 
vtsujii_DE_eye_orthogroups <- check_and_subset(vtsujii_DE_eye, 
vargulatsujii_v_skogsbergia_orthologs_factor)
In [10]:
head(vtsujii_DE_eye_orthogroups)
A data.frame: 6 × 3
OrthogroupVargula_tsujii_cdhit_95.fasta.transdecoderSkogsbergia_sp
<chr><chr><chr>
1OG0000024NODE_10126_length_1999_cov_2.60802_g7138_i0.p1, NODE_15785_length_1351_cov_2.65432_g7138_i1.p1, NODE_25112_length_746_cov_273.855_g18649_i0.p1 TRINITY_DN1563_c1_g2_i1.p1
2OG0000025NODE_10411_length_1962_cov_1.81279_g354_i2.p1, NODE_519_length_5792_cov_3.53878_g354_i0.p1, NODE_24375_length_778_cov_3.99308_g18021_i0.p1 TRINITY_DN8660_c0_g1_i1.p1
3OG0000645NODE_11307_length_1839_cov_125.879_g626_i29.p1, NODE_18319_length_1131_cov_180.691_g626_i35.p1, NODE_11481_length_1816_cov_116.28_g626_i30.p1 TRINITY_DN24096_c0_g1_i1.p1
4OG0000645NODE_11307_length_1839_cov_125.879_g626_i29.p1, NODE_18319_length_1131_cov_180.691_g626_i35.p1, NODE_11481_length_1816_cov_116.28_g626_i30.p1 TRINITY_DN24096_c0_g1_i1.p1
5OG0000044NODE_1784_length_4160_cov_76.1535_g626_i13.p1, NODE_1202_length_4664_cov_69.9696_g626_i2.p1, NODE_1590_length_4296_cov_69.1978_g626_i11.p1, NODE_1313_length_4552_cov_98.4063_g626_i4.p1, NODE_893_length_5050_cov_88.8799_g626_i0.p1, NODE_935_length_4989_cov_81.7586_g626_i1.p1TRINITY_DN31536_c0_g1_i1.p1, TRINITY_DN61_c0_g1_i7.p1
6OG0008641NODE_1239_length_4637_cov_2.40244_g867_i0.p1 TRINITY_DN4567_c0_g1_i1.p1
In [125]:
vtsujii_DE_eye_orthogroups_numbers <- subset(vtsujii_DE_eye_orthogroups, select = "Orthogroup")
In [126]:
head(vtsujii_DE_eye_orthogroups_numbers)
A data.frame: 6 × 1
Orthogroup
<chr>
1OG0000024
2OG0000025
3OG0000645
4OG0000645
5OG0000044
6OG0008641
In [127]:
vtsujii_DE_eye_orthogroups_numbers_rmdup <- vtsujii_DE_eye_orthogroups_numbers %>% distinct()
In [128]:
nrow(vtsujii_DE_eye_orthogroups_numbers_rmdup)
160

Determine the orthogroups that contain Skogsbergia sp. significantly upregulated compound eye genes

In [129]:
skogs_DE_eye <- subset(unique_genes_compound_eye, select = "gene")
In [130]:
colnames(skogs_DE_eye)[1]<- "transcript_id"
In [131]:
head(skogs_DE_eye)
A data.frame: 6 × 1
transcript_id
<chr>
1TRINITY_DN1015_c0_g1_i4
2TRINITY_DN10186_c1_g1_i1
3TRINITY_DN103043_c0_g1_i1
4TRINITY_DN1040_c0_g1_i5
5TRINITY_DN10657_c0_g1_i1
6TRINITY_DN10680_c5_g1_i1
In [132]:
# use this function and change the column names 

check_and_subset <- function(df1, df2) {
  matched_rows_list <- list()
  for (i in 1:nrow(df1)) {
    char_row <- df1[i, "transcript_id"]
    matched_rows <- df2[str_detect(df2$Skogsbergia_sp, paste0(char_row, collapse = "|")), , drop = FALSE]
    if (nrow(matched_rows) > 0) {
      matched_rows_list[[i]] <- matched_rows
    }
  }
  if (length(matched_rows_list) > 0) {
    matched_df <- bind_rows(matched_rows_list)
    return(matched_df)
  } else {
    return(NULL)
  }
}
In [133]:
# Use the function 
skogs_DE_eye_orthogroups <- check_and_subset(skogs_DE_eye, vargulatsujii_v_skogsbergia_orthologs_factor)
In [11]:
head(skogs_DE_eye_orthogroups)
A data.frame: 6 × 3
OrthogroupVargula_tsujii_cdhit_95.fasta.transdecoderSkogsbergia_sp
<chr><chr><chr>
1OG0002719NODE_25869_length_718_cov_30.8974_g19325_i0.p1, NODE_17417_length_1203_cov_28.574_g12375_i0.p1, NODE_20324_length_993_cov_27.4659_g14611_i0.p1TRINITY_DN1015_c0_g1_i4.p1
2OG0000231NODE_679_length_5418_cov_4.27485_g474_i0.p1 TRINITY_DN1040_c0_g1_i5.p1
3OG0001041NODE_40351_length_393_cov_30.6864_g33159_i0.p1, NODE_40160_length_395_cov_140.374_g32970_i0.p1, NODE_44143_length_354_cov_76.0334_g36923_i0.p1TRINITY_DN1173_c0_g1_i2.p1
4OG0001585NODE_127_length_7789_cov_13.8001_g81_i0.p1 TRINITY_DN1354_c1_g6_i1.p1
5OG0001264NODE_13773_length_1553_cov_0.555407_g2165_i3.p1, NODE_3090_length_3492_cov_3.04772_g2165_i0.p1, NODE_4075_length_3173_cov_3.11514_g2165_i2.p1 TRINITY_DN1358_c1_g1_i15.p1
6OG0001775NODE_21463_length_922_cov_3.13741_g15517_i0.p1 TRINITY_DN14601_c0_g1_i5.p1
In [135]:
skogs_DE_eye_orthogroups_numbers <- subset(skogs_DE_eye_orthogroups, select = "Orthogroup" )
In [136]:
skogs_DE_eye_orthogroups_numbers_rmdup <- skogs_DE_eye_orthogroups_numbers %>% distinct()
In [137]:
nrow(skogs_DE_eye_orthogroups_numbers_rmdup)
80

Determine the number of orthogroups with significantly upregulated compound eye genes shared between V.tsujii and Skogsbergia sp.

In [12]:
head(vtsujii_DE_eye_orthogroups_numbers_rmdup$Orthogroup)

head(skogs_DE_eye_orthogroups_numbers_rmdup$Orthogroup)
  1. 'OG0000024'
  2. 'OG0000025'
  3. 'OG0000645'
  4. 'OG0000044'
  5. 'OG0008641'
  6. 'OG0000427'
  1. 'OG0002719'
  2. 'OG0000231'
  3. 'OG0001041'
  4. 'OG0001585'
  5. 'OG0001264'
  6. 'OG0001775'
In [139]:
#convert columns to characters for the function below 

vtsujii_DE_eye_orthogroups_numbers_rmdup <- vtsujii_DE_eye_orthogroups_numbers_rmdup %>% mutate(Orthogroup = as.character(Orthogroup))
In [140]:
skogs_DE_eye_orthogroups_numbers_rmdup <- skogs_DE_eye_orthogroups_numbers_rmdup %>% mutate(Orthogroup = as.character(Orthogroup))
In [141]:
#start with the longer list first which is V.tsujii


vtsujii_skogs_orthogroups_numbers_match <- vtsujii_DE_eye_orthogroups_numbers_rmdup %>%
    mutate(match = c("no", "yes")[1 + (rowSums(
        outer(
            strsplit(Orthogroup, "\\s+"),
            strsplit(skogs_DE_eye_orthogroups_numbers_rmdup$Orthogroup, "\\s+"),
            Vectorize(function(x, y) all(x %in% y) | all(y %in% x))
        )
    ) > 0)])
In [142]:
head(vtsujii_skogs_orthogroups_numbers_match)
A data.frame: 6 × 2
Orthogroupmatch
<chr><chr>
1OG0000024no
2OG0000025no
3OG0000645no
4OG0000044yes
5OG0008641yes
6OG0000427yes
In [143]:
vtsujii_skogs_orthogroups_numbers_match_yes <- vtsujii_skogs_orthogroups_numbers_match %>% filter(match=="yes")
In [144]:
vtsujii_skogs_orthogroups_numbers_match_yes
A data.frame: 23 × 2
Orthogroupmatch
<chr><chr>
OG0000044yes
OG0008641yes
OG0000427yes
OG0001391yes
OG0008339yes
OG0002719yes
OG0000877yes
OG0005432yes
OG0008613yes
OG0000493yes
OG0016754yes
OG0001264yes
OG0003585yes
OG0000231yes
OG0004933yes
OG0005465yes
OG0000241yes
OG0001617yes
OG0000806yes
OG0001167yes
OG0000037yes
OG0000445yes
OG0005437yes
In [145]:
#take the Orthogroup numbers  


vtsujii_skogs_orthogroups_numbers_match_yes_ch <- as.character(vtsujii_skogs_orthogroups_numbers_match_yes$Orthogroup)
In [146]:
vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES <-  vargulatsujii_v_skogsbergia_orthologs_factor %>% filter(Orthogroup %in% vtsujii_skogs_orthogroups_numbers_match_yes_ch)
In [147]:
head(vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES)
A data.frame: 6 × 3
OrthogroupVargula_tsujii_cdhit_95.fasta.transdecoderSkogsbergia_sp
<chr><chr><chr>
1OG0000037NODE_27262_length_666_cov_6.40917_g520_i34.p1 TRINITY_DN2554_c0_g2_i10.p1
2OG0000044NODE_7263_length_2451_cov_14.1957_g5146_i0.p1 TRINITY_DN15217_c0_g1_i1.p2, TRINITY_DN15217_c0_g1_i1.p1
3OG0000044NODE_297_length_6614_cov_0.573716_g195_i0.p1 TRINITY_DN56424_c0_g1_i1.p1
4OG0000044NODE_4458_length_3060_cov_0.242263_g3155_i0.p1 TRINITY_DN1381_c0_g1_i2.p2
5OG0000044NODE_523_length_5773_cov_37.5913_g357_i0.p1 TRINITY_DN5581_c0_g1_i1.p1
6OG0000044NODE_1784_length_4160_cov_76.1535_g626_i13.p1, NODE_1202_length_4664_cov_69.9696_g626_i2.p1, NODE_1590_length_4296_cov_69.1978_g626_i11.p1, NODE_1313_length_4552_cov_98.4063_g626_i4.p1, NODE_893_length_5050_cov_88.8799_g626_i0.p1, NODE_935_length_4989_cov_81.7586_g626_i1.p1TRINITY_DN31536_c0_g1_i1.p1, TRINITY_DN61_c0_g1_i7.p1

Extract all V.tsujii significantly upregulated compound eye genes from orthogroups shared across both compound eyes

In [148]:
#now extract all transcripts 
vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES_transcript_ids <- as.character(vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES$Vargula_tsujii_cdhit_95.fasta.transdecoder)
In [149]:
vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES_transcript_ids_unlist  <- unlist(strsplit(vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES_transcript_ids,","))
In [150]:
#remove p* 
vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES_transcript_ids_unlist_removep <- vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES_transcript_ids_unlist %>% str_replace("(.p1)", "") %>% 
str_replace("(.p2)", "")
In [151]:
#found out there is a space in front of certain transcripts 

vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES_transcript_ids_unlist_removep_removespace <- trimws(vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES_transcript_ids_unlist_removep)
In [152]:
vtsujii_eye_orthogroups_transcript_ids <- vtsujii_eye %>% filter (gene %in% vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES_transcript_ids_unlist_removep_removespace)
In [13]:
head(vtsujii_eye_orthogroups_transcript_ids)
A data.frame: 6 × 22
genebaseMeanlog2FoldChangelfcSEstatpvaluepadjX.gene_idsprot_Top_BLASTX_hitRNAMMERsprot_Top_BLASTP_hitPfamSignalPTmHMMeggnogKegggene_ontology_blastgene_ontology_pfamtranscriptpeptide
<chr><dbl><dbl><dbl><dbl><dbl><dbl><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr>
1NODE_1202_length_4664_cov_69.9696_g626_i2 329.56863 -9.7534530.8494682-10.8946551.222295e-271.346236e-24g626 TGMH_TACTR^TGMH_TACTR^Q:1793-3535,H:203-764^45.533%ID^E:1.41e-173^RecName: Full=Hemocyte protein-glutamine gamma-glutamyltransferase;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Chelicerata; Merostomata; Xiphosura; Limulidae; Tachypleus .TGMH_TACTR^TGMH_TACTR^Q:598-1178,H:203-764^45.533%ID^E:1.04e-179^RecName: Full=Hemocyte protein-glutamine gamma-glutamyltransferase;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Chelicerata; Merostomata; Xiphosura; Limulidae; Tachypleus PF01841.19^Transglut_core^Transglutaminase-like superfamily^733-837^E:2.1e-16`PF00927.22^Transglut_C^Transglutaminase family, C-terminal ig like domain^978-1068^E:2.6e-14`PF00927.22^Transglut_C^Transglutaminase family, C-terminal ig like domain^1081-1173^E:7.7e-10 .. . . GO:0016020^cellular_component^membrane`GO:0046872^molecular_function^metal ion binding`GO:0003810^molecular_function^protein-glutamine gamma-glutamyltransferase activity`GO:0018149^biological_process^peptide cross-linking GO:0003810^molecular_function^protein-glutamine gamma-glutamyltransferase activity`GO:0018149^biological_process^peptide cross-linking..
2NODE_1239_length_4637_cov_2.40244_g867_i0 189.53605 -8.7658970.7843209-10.7431736.381261e-276.389382e-24g867 . .. . .. . . . . ..
3NODE_12527_length_1690_cov_1.46239_g6648_i2 18.15362 -7.7903261.1190367 -6.7539471.438768e-113.482767e-09g6648ARRH_HELVI^ARRH_HELVI^Q:1688-561,H:2-376^57.256%ID^E:3.61e-139^RecName: Full=Arrestin homolog;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Holometabola; Lepidoptera; Glossata; Ditrysia; Noctuoidea; Noctuidae; Heliothinae; Heliothis .ARRH_HELVI^ARRH_HELVI^Q:1-376,H:2-376^57.743%ID^E:3.34e-151^RecName: Full=Arrestin homolog;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Holometabola; Lepidoptera; Glossata; Ditrysia; Noctuoidea; Noctuidae; Heliothinae; Heliothis PF00339.29^Arrestin_N^Arrestin (or S-antigen), N-terminal domain^18-172^E:1.1e-30`PF02752.22^Arrestin_C^Arrestin (or S-antigen), C-terminal domain^193-350^E:2.2e-26 .. . . GO:0007165^biological_process^signal transduction . ..
4NODE_1313_length_4552_cov_98.4063_g626_i4 2470.28327-10.4049021.1304713 -8.9806622.691424e-191.209932e-16g626 TGMH_TACTR^TGMH_TACTR^Q:1793-3529,H:203-762^45.345%ID^E:1.43e-174^RecName: Full=Hemocyte protein-glutamine gamma-glutamyltransferase;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Chelicerata; Merostomata; Xiphosura; Limulidae; Tachypleus .TGMH_TACTR^TGMH_TACTR^Q:598-1176,H:203-762^45.345%ID^E:1.06e-180^RecName: Full=Hemocyte protein-glutamine gamma-glutamyltransferase;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Chelicerata; Merostomata; Xiphosura; Limulidae; Tachypleus PF01841.19^Transglut_core^Transglutaminase-like superfamily^733-837^E:1.6e-16`PF00927.22^Transglut_C^Transglutaminase family, C-terminal ig like domain^978-1068^E:2.6e-14`PF00927.22^Transglut_C^Transglutaminase family, C-terminal ig like domain^1081-1173^E:5.1e-09 .. . . GO:0016020^cellular_component^membrane`GO:0046872^molecular_function^metal ion binding`GO:0003810^molecular_function^protein-glutamine gamma-glutamyltransferase activity`GO:0018149^biological_process^peptide cross-linking GO:0003810^molecular_function^protein-glutamine gamma-glutamyltransferase activity`GO:0018149^biological_process^peptide cross-linking..
5NODE_13576_length_1573_cov_1750.19_g9510_i031179.93849 -8.6398200.8838926 -9.7248262.363096e-221.445952e-19g9510ARRH_LOCMI^ARRH_LOCMI^Q:134-1258,H:19-402^57.812%ID^E:7.49e-163^RecName: Full=Arrestin homolog;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Polyneoptera; Orthoptera; Caelifera; Acrididea; Acridomorpha; Acridoidea; Acrididae; Oedipodinae; Locusta.ARRH_LOCMI^ARRH_LOCMI^Q:11-385,H:19-402^58.073%ID^E:5.22e-165^RecName: Full=Arrestin homolog;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Polyneoptera; Orthoptera; Caelifera; Acrididea; Acridomorpha; Acridoidea; Acrididae; Oedipodinae; LocustaPF00339.29^Arrestin_N^Arrestin (or S-antigen), N-terminal domain^18-171^E:2.8e-25`PF02752.22^Arrestin_C^Arrestin (or S-antigen), C-terminal domain^192-344^E:3e-22 .. . . GO:0007165^biological_process^signal transduction . ..
6NODE_13753_length_1554_cov_32.6024_g9635_i0 114.67605 -2.5216750.4241977 -5.9431992.795121e-094.965398e-07g9635GOLI_DROME^GOLI_DROME^Q:475-1167,H:140-368^48.052%ID^E:2.95e-59^RecName: Full=Protein goliath;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Holometabola; Diptera; Brachycera; Muscomorpha; Ephydroidea; Drosophilidae; Drosophila; Sophophora .GOLI_DROME^GOLI_DROME^Q:85-351,H:100-368^45.018%ID^E:8.06e-72^RecName: Full=Protein goliath;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Holometabola; Diptera; Brachycera; Muscomorpha; Ephydroidea; Drosophilidae; Drosophila; Sophophora PF13639.6^zf-RING_2^Ring finger domain^285-327^E:6.8e-10`PF17123.5^zf-RING_11^RING-like zinc finger^286-313^E:6.7e-08`PF13923.6^zf-C3HC4_2^Zinc finger, C3HC4 type (RING finger)^286-326^E:8.2e-08`PF00097.25^zf-C3HC4^Zinc finger, C3HC4 type (RING finger)^286-326^E:1.1e-07`PF12678.7^zf-rbx1^RING-H2 zinc finger domain^288-327^E:1.8e-07.ExpAA=44.23^PredHel=2^Topology=o11-33i217-239oENOG41121N2^zinc ion bindingKEGG:dme:Dmel_CG2679GO:0005768^cellular_component^endosome`GO:0016021^cellular_component^integral component of membrane`GO:0005634^cellular_component^nucleus`GO:0003677^molecular_function^DNA binding`GO:0061630^molecular_function^ubiquitin protein ligase activity`GO:0008270^molecular_function^zinc ion binding`GO:0001707^biological_process^mesoderm formation`GO:0016567^biological_process^protein ubiquitination`GO:0006511^biological_process^ubiquitin-dependent protein catabolic processGO:0046872^molecular_function^metal ion binding`GO:0008270^molecular_function^zinc ion binding ..
In [154]:
nrow(vtsujii_eye_orthogroups_transcript_ids )
36

Extract all Skogsbergia sp. significantly upregulated genes from orthogroups shared across both compound eyes

In [155]:
#now extract all transcripts 

vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES_skogs_transcript_ids <- as.character(vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES$Skogsbergia_sp)
In [156]:
vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES_skogs_transcript_ids_unlist  <- unlist(strsplit(vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES_skogs_transcript_ids,","))


#remove p* 
vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES_skogs_transcript_ids_unlist_removep <- vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES_skogs_transcript_ids_unlist %>% str_replace("(.p1)", "") %>% 
str_replace("(.p2)", "") 
                              

#found out there is a space in front of certain transcripts 

vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES_skogs_transcript_ids_unlist_removep_removespace <- trimws(vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES_skogs_transcript_ids_unlist_removep)
In [157]:
skogs_eye_orthogroups_transcript_ids <- skogs_DE_eye %>% filter (transcript_id %in% vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_EYES_skogs_transcript_ids_unlist_removep_removespace)
In [158]:
skogs_eye_orthogroups_transcript_ids
A data.frame: 26 × 1
transcript_id
<chr>
TRINITY_DN1015_c0_g1_i4
TRINITY_DN1040_c0_g1_i5
TRINITY_DN1358_c1_g1_i15
TRINITY_DN14639_c0_g1_i1
TRINITY_DN1501_c1_g1_i4
TRINITY_DN1673_c2_g1_i1
TRINITY_DN17206_c0_g1_i7
TRINITY_DN27541_c1_g1_i2
TRINITY_DN3565_c0_g1_i2
TRINITY_DN36346_c0_g1_i2
TRINITY_DN3699_c1_g1_i2
TRINITY_DN3729_c0_g1_i3
TRINITY_DN4567_c0_g1_i1
TRINITY_DN5581_c0_g1_i1
TRINITY_DN56424_c0_g1_i1
TRINITY_DN57934_c0_g1_i1
TRINITY_DN585_c0_g1_i1
TRINITY_DN61_c0_g1_i7
TRINITY_DN66275_c0_g1_i1
TRINITY_DN7283_c0_g1_i2
TRINITY_DN17432_c0_g1_i1
TRINITY_DN2554_c0_g2_i10
TRINITY_DN312_c0_g1_i1
TRINITY_DN3704_c0_g1_i1
TRINITY_DN38_c0_g1_i2
TRINITY_DN536_c1_g1_i5
In [159]:
nrow(skogs_eye_orthogroups_transcript_ids)
26

Determine the number of shared significantly upregulated genes between luminous upper lip and non-lumunous upper lip

Import

In [160]:
#import V.tsujii upper lip 
vtsujii_bio_upper_lip <- read.csv("df_Vtsujii_sigfig_upreg_unique_BioUpperLip.csv", header = TRUE, row.names=1,
                  stringsAsFactors = FALSE)
In [161]:
#use the already imported sheet all orthologs from Section 5.1.1
head(vargulatsujii_v_skogsbergia_orthologs_factor)
A data.frame: 7691 × 3
OrthogroupVargula_tsujii_cdhit_95.fasta.transdecoderSkogsbergia_sp
<chr><chr><chr>
OG0000000NODE_10135_length_1998_cov_0.0936696_g1431_i1.p1, NODE_29_length_9922_cov_21.9983_g21_i0.p1, NODE_51985_length_298_cov_2.3786_g44765_i0.p1, NODE_1757_length_4178_cov_0.681057_g1230_i0.p1, NODE_5045_length_2903_cov_0.0983146_g3589_i0.p1, NODE_6915_length_2515_cov_26.5618_g4907_i0.p1 TRINITY_DN104_c0_g1_i13.p1, TRINITY_DN48525_c0_g1_i2.p1, TRINITY_DN104_c2_g4_i1.p1, TRINITY_DN11374_c0_g1_i5.p2, TRINITY_DN11454_c1_g1_i4.p1, TRINITY_DN36329_c1_g1_i1.p1, TRINITY_DN12263_c0_g1_i1.p1, TRINITY_DN18470_c0_g2_i1.p1, TRINITY_DN4890_c1_g1_i6.p1, TRINITY_DN1412_c0_g3_i10.p4, TRINITY_DN1412_c0_g4_i2.p1, TRINITY_DN4561_c0_g1_i12.p1, TRINITY_DN104_c2_g2_i3.p1, TRINITY_DN6718_c0_g3_i6.p1, TRINITY_DN67800_c0_g1_i1.p1, TRINITY_DN11716_c0_g2_i1.p1, TRINITY_DN25498_c0_g1_i1.p1, TRINITY_DN15017_c0_g1_i1.p1, TRINITY_DN19426_c0_g1_i2.p1, TRINITY_DN11454_c0_g2_i2.p1, TRINITY_DN6718_c0_g1_i1.p1, TRINITY_DN10747_c0_g1_i7.p1, TRINITY_DN25693_c0_g1_i1.p1, TRINITY_DN35713_c0_g1_i1.p1, TRINITY_DN47373_c0_g1_i1.p1, TRINITY_DN57055_c0_g1_i1.p1, TRINITY_DN3755_c0_g1_i4.p1, TRINITY_DN36596_c0_g1_i1.p1, TRINITY_DN31005_c0_g1_i1.p1, TRINITY_DN36867_c0_g1_i1.p1
OG0000000NODE_6981_length_2505_cov_1.1751_g4950_i0.p1, NODE_1965_length_4053_cov_7.00175_g1371_i0.p1, NODE_1695_length_4213_cov_7.70635_g1185_i0.p1, NODE_5896_length_2710_cov_1.26591_g4187_i0.p1, NODE_2126_length_3968_cov_23.5955_g1482_i0.p1, NODE_3679_length_3288_cov_7.39839_g2592_i0.p1, NODE_2677_length_3689_cov_3.17474_g1860_i0.p1, NODE_1433_length_4447_cov_35.7361_g986_i0.p1, NODE_2978_length_3542_cov_86.2509_g1026_i1.p1, NODE_1714_length_4204_cov_1.97132_g1201_i0.p1, NODE_2412_length_3813_cov_4.34646_g1680_i0.p1, NODE_1889_length_4101_cov_16.4234_g1317_i0.p1, NODE_4638_length_3008_cov_0_g3278_i0.p1, NODE_1946_length_4074_cov_0_g1356_i0.p1 TRINITY_DN70787_c0_g1_i1.p1
OG0000000NODE_1668_length_4233_cov_0.215414_g1162_i0.p1, NODE_46195_length_337_cov_2.2766_g38975_i0.p1, NODE_46862_length_332_cov_2.27798_g39642_i0.p1, NODE_28371_length_630_cov_3.02783_g21604_i0.p1, NODE_49746_length_312_cov_2.03113_g42526_i0.p1, NODE_4993_length_2913_cov_4.50245_g415_i6.p1, NODE_5572_length_2776_cov_6.68394_g3957_i0.p1, NODE_32876_length_511_cov_2.40132_g25833_i0.p2 TRINITY_DN26605_c0_g1_i1.p1, TRINITY_DN7221_c2_g2_i2.p1, TRINITY_DN27399_c0_g1_i1.p2, TRINITY_DN52051_c0_g1_i1.p1
OG0000000NODE_420_length_6078_cov_26.8343_g280_i0.p1 TRINITY_DN3439_c0_g1_i9.p1, TRINITY_DN48719_c0_g1_i2.p1, TRINITY_DN8716_c0_g1_i6.p1, TRINITY_DN8716_c2_g1_i1.p1, TRINITY_DN3746_c1_g1_i10.p2, TRINITY_DN3746_c1_g1_i10.p1, TRINITY_DN43198_c0_g1_i1.p1
OG0000000NODE_5291_length_2837_cov_5.55895_g3765_i0.p1 TRINITY_DN24813_c0_g1_i1.p1, TRINITY_DN13581_c0_g1_i4.p1, TRINITY_DN96293_c0_g1_i1.p1
OG0000001NODE_38160_length_421_cov_1.40984_g30995_i0.p1, NODE_43900_length_357_cov_1.58278_g36681_i0.p1 TRINITY_DN49841_c0_g1_i2.p1, TRINITY_DN9841_c1_g1_i7.p1
OG0000001NODE_34210_length_483_cov_3.08178_g27126_i0.p1 TRINITY_DN95414_c0_g1_i1.p1
OG0000001NODE_28302_length_632_cov_2.16118_g21539_i0.p1 TRINITY_DN15305_c0_g1_i1.p1
OG0000001NODE_46001_length_339_cov_1.47887_g38781_i0.p1 TRINITY_DN35258_c0_g1_i1.p1, TRINITY_DN48018_c0_g1_i1.p1
OG0000001NODE_34829_length_472_cov_3.04556_g27730_i0.p1 TRINITY_DN25353_c0_g1_i1.p1, TRINITY_DN6949_c2_g1_i1.p1, TRINITY_DN14843_c0_g2_i1.p1, TRINITY_DN14843_c2_g1_i2.p1, TRINITY_DN80817_c0_g1_i1.p1, TRINITY_DN48199_c0_g1_i1.p1, TRINITY_DN45148_c0_g1_i1.p1, TRINITY_DN63040_c0_g1_i1.p1
OG0000001NODE_46192_length_337_cov_2.34397_g38972_i0.p1 TRINITY_DN55489_c0_g1_i1.p1, TRINITY_DN31662_c0_g1_i1.p1, TRINITY_DN13076_c1_g1_i1.p2, TRINITY_DN14349_c0_g1_i1.p1, TRINITY_DN25100_c0_g1_i1.p1, TRINITY_DN22531_c4_g1_i1.p1, TRINITY_DN45231_c0_g1_i1.p1, TRINITY_DN20907_c0_g1_i1.p1, TRINITY_DN26248_c0_g3_i1.p1, TRINITY_DN19671_c0_g1_i1.p1, TRINITY_DN97919_c0_g1_i1.p1, TRINITY_DN25454_c0_g1_i1.p1, TRINITY_DN28679_c1_g2_i1.p1, TRINITY_DN4618_c1_g1_i1.p1, TRINITY_DN8593_c0_g2_i2.p1, TRINITY_DN99727_c0_g1_i1.p1, TRINITY_DN13447_c12_g1_i1.p1, TRINITY_DN17693_c1_g4_i1.p1, TRINITY_DN23226_c2_g1_i2.p1, TRINITY_DN30052_c0_g1_i1.p1, TRINITY_DN25540_c0_g2_i1.p1, TRINITY_DN49253_c0_g1_i1.p1, TRINITY_DN17693_c0_g1_i1.p1, TRINITY_DN95374_c0_g1_i1.p1, TRINITY_DN29049_c0_g1_i1.p1, TRINITY_DN42956_c0_g1_i1.p1, TRINITY_DN23560_c1_g1_i1.p1, TRINITY_DN12924_c2_g1_i2.p1, TRINITY_DN98367_c0_g1_i1.p1, TRINITY_DN49362_c0_g1_i1.p1, TRINITY_DN31682_c0_g1_i1.p1, TRINITY_DN21469_c0_g1_i1.p1, TRINITY_DN37152_c0_g1_i1.p1, TRINITY_DN7950_c4_g2_i1.p1, TRINITY_DN22651_c1_g2_i2.p1, TRINITY_DN47058_c0_g1_i1.p1, TRINITY_DN14789_c2_g1_i1.p1, TRINITY_DN34900_c0_g1_i4.p1, TRINITY_DN17763_c1_g2_i1.p1, TRINITY_DN20440_c1_g1_i1.p1, TRINITY_DN20440_c1_g2_i1.p1, TRINITY_DN32824_c0_g1_i1.p1, TRINITY_DN14787_c1_g1_i5.p1, TRINITY_DN12206_c1_g4_i3.p1, TRINITY_DN50529_c0_g1_i1.p1, TRINITY_DN7950_c1_g1_i1.p1, TRINITY_DN29263_c1_g2_i1.p1, TRINITY_DN7719_c1_g2_i2.p1, TRINITY_DN49007_c1_g1_i1.p1, TRINITY_DN12924_c1_g2_i1.p1, TRINITY_DN22651_c1_g1_i2.p1, TRINITY_DN41432_c2_g1_i1.p2, TRINITY_DN42292_c0_g1_i2.p1, TRINITY_DN29263_c1_g1_i1.p1, TRINITY_DN11601_c1_g1_i1.p2, TRINITY_DN10794_c0_g1_i1.p1, TRINITY_DN29785_c0_g1_i1.p2, TRINITY_DN34900_c0_g2_i1.p2, TRINITY_DN41432_c0_g3_i2.p1, TRINITY_DN12924_c0_g1_i3.p1, TRINITY_DN12961_c1_g1_i1.p1, TRINITY_DN12961_c1_g2_i1.p1, TRINITY_DN102199_c0_g1_i1.p1, TRINITY_DN17763_c0_g1_i4.p1, TRINITY_DN11601_c1_g2_i3.p1, TRINITY_DN17012_c0_g2_i1.p1, TRINITY_DN38037_c0_g1_i3.p1, TRINITY_DN12924_c6_g1_i1.p1, TRINITY_DN34900_c0_g4_i1.p1, TRINITY_DN41432_c0_g1_i1.p1, TRINITY_DN7719_c1_g5_i1.p1, TRINITY_DN10145_c0_g2_i1.p1, TRINITY_DN13530_c2_g2_i1.p1, TRINITY_DN11601_c3_g1_i1.p1, TRINITY_DN17012_c0_g1_i1.p1, TRINITY_DN32273_c1_g1_i1.p1, TRINITY_DN2116_c1_g1_i1.p1, TRINITY_DN72805_c0_g1_i1.p1, TRINITY_DN13603_c0_g1_i1.p1, TRINITY_DN29263_c0_g1_i1.p1, TRINITY_DN79294_c0_g1_i1.p1, TRINITY_DN43501_c0_g2_i1.p1, TRINITY_DN43501_c1_g3_i1.p1, TRINITY_DN12815_c2_g1_i1.p1
OG0000001NODE_48125_length_323_cov_1.49254_g40905_i0.p1 TRINITY_DN12206_c0_g2_i4.p1
OG0000001NODE_44912_length_348_cov_2.7099_g37692_i0.p1, NODE_28173_length_636_cov_2.41308_g21422_i0.p1 TRINITY_DN31682_c2_g1_i1.p1
OG0000001NODE_40629_length_390_cov_2.93731_g33431_i0.p1 TRINITY_DN18899_c0_g1_i1.p1
OG0000001NODE_31640_length_541_cov_1.62963_g24647_i0.p1 TRINITY_DN70069_c0_g1_i1.p1
OG0000001NODE_22914_length_844_cov_2.43599_g16758_i0.p1 TRINITY_DN22758_c4_g1_i1.p1
OG0000001NODE_35387_length_463_cov_1.93873_g28279_i0.p1 TRINITY_DN10125_c8_g1_i1.p1, TRINITY_DN13808_c1_g1_i1.p1, TRINITY_DN32847_c4_g1_i1.p1
OG0000001NODE_47945_length_324_cov_1.98141_g40725_i0.p1 TRINITY_DN22785_c0_g2_i1.p1
OG0000002NODE_12535_length_1689_cov_0.0630355_g8782_i0.p1, NODE_28437_length_628_cov_2.82723_g21664_i0.p2, NODE_3347_length_3396_cov_18.6178_g520_i10.p1, NODE_14997_length_1427_cov_0_g10539_i0.p1 TRINITY_DN404_c2_g3_i4.p2
OG0000002NODE_4956_length_2921_cov_2.57223_g2051_i2.p1 TRINITY_DN9569_c0_g2_i11.p1
OG0000002NODE_22408_length_871_cov_2.69975_g16319_i0.p1, NODE_5367_length_2819_cov_19.3086_g3815_i0.p1, NODE_28308_length_631_cov_95.8125_g21545_i0.p1, NODE_6953_length_2510_cov_11.9992_g520_i18.p1, NODE_10927_length_1893_cov_2.39336_g4165_i1.p1, NODE_11305_length_1840_cov_1.18711_g4165_i2.p1, NODE_15729_length_1355_cov_47.6892_g11061_i0.p1, NODE_8744_length_2200_cov_2.17156_g5833_i1.p1, NODE_14664_length_1461_cov_3.93528_g10288_i0.p1, NODE_32110_length_529_cov_3.20042_g25097_i0.p1, NODE_14558_length_1472_cov_5.75865_g10212_i0.p1, NODE_3208_length_3448_cov_0.146183_g2254_i0.p1, NODE_1675_length_4228_cov_6.33717_g141_i2.p1, NODE_3119_length_3480_cov_10.7247_g141_i3.p1, NODE_41790_length_378_cov_1.40867_g34577_i0.p1, NODE_8250_length_2283_cov_0.552513_g5833_i0.p1, NODE_33063_length_507_cov_3.07522_g26013_i0.p1, NODE_14153_length_1512_cov_2.57996_g9916_i0.p1, NODE_15201_length_1407_cov_1.23817_g10683_i0.p1, NODE_11206_length_1853_cov_94.7675_g5098_i2.p1, NODE_8485_length_2244_cov_88.7095_g520_i22.p1, NODE_2956_length_3551_cov_54.083_g520_i7.p1, NODE_25638_length_726_cov_39.7571_g19123_i0.p1, NODE_7192_length_2461_cov_26.6604_g5098_i0.p1, NODE_11687_length_1791_cov_103.518_g520_i28.p1, NODE_8381_length_2262_cov_69.3969_g520_i21.p1, NODE_12059_length_1748_cov_21.8399_g4376_i1.p1, NODE_15552_length_1373_cov_29.7481_g4376_i2.p1, NODE_6156_length_2663_cov_11.6223_g4376_i0.p1, NODE_11306_length_1840_cov_0.769188_g7923_i0.p1, NODE_3069_length_3499_cov_23.0012_g2153_i0.p1, NODE_17797_length_1173_cov_3.90072_g12655_i0.p1, NODE_6182_length_2659_cov_20.8802_g520_i16.p1, NODE_21118_length_944_cov_17.5681_g15242_i0.p1, NODE_27568_length_655_cov_114.768_g20854_i0.p1, NODE_6534_length_2587_cov_0.330964_g4638_i0.p1, NODE_12067_length_1748_cov_0.99114_g8449_i0.p1, NODE_36226_length_450_cov_0.888608_g29097_i0.p1, NODE_11149_length_1861_cov_0.440753_g7819_i0.p1, NODE_3216_length_3443_cov_0.0876623_g2261_i0.p1TRINITY_DN4745_c0_g1_i8.p1
OG0000002NODE_7178_length_2463_cov_1.08347_g3220_i1.p1 TRINITY_DN45271_c0_g1_i1.p1, TRINITY_DN80023_c0_g1_i1.p1
OG0000003NODE_12389_length_1705_cov_3.35455_g8689_i0.p1 TRINITY_DN88514_c0_g1_i1.p1, TRINITY_DN98908_c0_g1_i1.p1
OG0000003NODE_12124_length_1740_cov_6.73294_g8494_i0.p1 TRINITY_DN59493_c0_g1_i1.p1
OG0000004NODE_11402_length_1826_cov_23.1299_g7985_i0.p1, NODE_15554_length_1373_cov_8.95068_g10926_i0.p1, NODE_8530_length_2238_cov_16.2698_g6039_i0.p1, NODE_9752_length_2052_cov_15.6019_g6039_i1.p1, NODE_24349_length_779_cov_3.33702_g17997_i0.p2, NODE_13080_length_1628_cov_0.624921_g6039_i2.p1, NODE_17733_length_1178_cov_1.83972_g6039_i4.p1, NODE_15496_length_1379_cov_11.0989_g6039_i3.p1, NODE_13267_length_1609_cov_12.7915_g9295_i0.p1, NODE_13291_length_1607_cov_6.24678_g9312_i0.p1, NODE_46924_length_331_cov_7.30797_g39704_i0.p1, NODE_14195_length_1507_cov_24.635_g9949_i0.p1, NODE_15844_length_1345_cov_7.19302_g11144_i0.p1, NODE_24824_length_760_cov_3.35319_g18402_i0.p1, NODE_30053_length_581_cov_28.5932_g23152_i0.p1, NODE_15219_length_1405_cov_19.9141_g10695_i0.p1, NODE_18899_length_1089_cov_25.1634_g10695_i1.p1, NODE_23176_length_831_cov_10.9356_g16989_i0.p1, NODE_3027_length_3524_cov_58.4552_g2119_i0.p1, NODE_7122_length_2476_cov_60.8207_g2119_i2.p1, NODE_5694_length_2750_cov_57.9711_g2119_i1.p1, NODE_9374_length_2106_cov_72.7889_g2119_i4.p1, NODE_28720_length_618_cov_91.5382_g21916_i0.p1, NODE_48287_length_321_cov_18.6992_g41067_i0.p1, NODE_8313_length_2274_cov_0.764308_g5880_i0.p1 TRINITY_DN5983_c0_g1_i13.p1
OG0000004NODE_51644_length_300_cov_2.8898_g44424_i0.p1 TRINITY_DN2911_c0_g1_i6.p1, TRINITY_DN10777_c0_g1_i3.p2, TRINITY_DN3108_c4_g1_i1.p1
OG0000005NODE_4258_length_3121_cov_2.23353_g3017_i0.p1 TRINITY_DN73_c0_g1_i2.p1
OG0000005NODE_9431_length_2097_cov_0.294319_g6660_i0.p1, NODE_17549_length_1192_cov_0_g12477_i0.p1 TRINITY_DN10379_c0_g1_i1.p1
OG0000005NODE_16592_length_1277_cov_280.146_g11720_i0.p1, NODE_16400_length_1294_cov_268.287_g11568_i0.p1, NODE_16807_length_1257_cov_6539.91_g11893_i0.p2 TRINITY_DN96035_c0_g1_i1.p1
OG0000005NODE_18874_length_1091_cov_31.9382_g13487_i0.p1 TRINITY_DN56829_c0_g1_i1.p1
OG0023101NODE_38511_length_416_cov_1.4072_g31343_i0.p1 TRINITY_DN39048_c0_g1_i1.p1
OG0023168NODE_11698_length_1791_cov_4.08295_g8183_i0.p1 TRINITY_DN24919_c0_g3_i1.p1
OG0023225NODE_41675_length_379_cov_2.0679_g34462_i0.p1 TRINITY_DN2507_c0_g1_i2.p1
OG0023805NODE_25367_length_737_cov_5.03519_g18878_i0.p1 TRINITY_DN60585_c0_g1_i1.p1
OG0024343NODE_37984_length_423_cov_7.50272_g30820_i0.p1 TRINITY_DN19131_c0_g1_i1.p2
OG0024357NODE_49799_length_312_cov_1.28405_g42579_i0.p1 TRINITY_DN26024_c0_g2_i1.p1
OG0024378NODE_17256_length_1217_cov_3.78485_g12255_i0.p1, NODE_31882_length_535_cov_2.81875_g24876_i0.p1TRINITY_DN35821_c0_g1_i1.p1
OG0024379NODE_41962_length_376_cov_1.75078_g34748_i0.p2, NODE_23046_length_837_cov_5.55243_g16874_i0.p3 TRINITY_DN3810_c0_g1_i11.p2
OG0024391NODE_48407_length_321_cov_1.43233_g41187_i0.p1 TRINITY_DN48119_c0_g1_i1.p1
OG0024400NODE_41899_length_376_cov_20.486_g34686_i0.p1 TRINITY_DN56783_c0_g1_i1.p1
OG0024403NODE_33682_length_494_cov_2.90433_g26615_i0.p1 TRINITY_DN6216_c3_g2_i2.p1
OG0024406NODE_4601_length_3018_cov_0.57003_g3248_i0.p1 TRINITY_DN6703_c1_g1_i1.p1
OG0024408NODE_13384_length_1597_cov_15.2536_g9375_i0.p1 TRINITY_DN76745_c0_g1_i1.p1
OG0024409NODE_3257_length_3425_cov_7.75519_g1638_i1.p1, NODE_9985_length_2020_cov_0.0727735_g454_i4.p1 TRINITY_DN76984_c0_g1_i1.p1
OG0024412NODE_20393_length_988_cov_7.68596_g14661_i0.p1 TRINITY_DN86058_c0_g1_i1.p1
OG0034999NODE_28914_length_613_cov_1.46416_g22095_i0.p2 TRINITY_DN101141_c0_g1_i1.p1
OG0035024NODE_42599_length_369_cov_3.1242_g35382_i0.p1 TRINITY_DN11709_c0_g1_i1.p1
OG0035156NODE_26863_length_680_cov_0_g20215_i0.p1 TRINITY_DN33400_c0_g1_i1.p1
OG0035194NODE_22731_length_853_cov_5.32456_g16592_i0.p1 TRINITY_DN4231_c1_g1_i1.p1
OG0035204NODE_30151_length_579_cov_2.46565_g23242_i0.p1 TRINITY_DN46148_c0_g1_i1.p1
OG0035256NODE_875_length_5081_cov_0_g610_i0.p1 TRINITY_DN59839_c0_g1_i1.p1
OG0035265NODE_33029_length_508_cov_2.31788_g25982_i0.p1 TRINITY_DN62743_c0_g1_i1.p1
OG0035284NODE_50035_length_310_cov_2.64706_g42815_i0.p1 TRINITY_DN69684_c0_g1_i1.p1
OG0035312NODE_46495_length_335_cov_1.78571_g39275_i0.p1 TRINITY_DN78151_c0_g1_i1.p1
OG0035321NODE_12599_length_1679_cov_0.044335_g8832_i0.p1 TRINITY_DN82725_c0_g1_i1.p1
OG0035324NODE_39494_length_404_cov_1.18052_g32312_i0.p1 TRINITY_DN83461_c0_g1_i1.p1
OG0035328NODE_50425_length_308_cov_1.39921_g43205_i0.p1 TRINITY_DN85906_c0_g1_i1.p1
OG0035329NODE_31547_length_542_cov_2245.63_g24558_i0.p1 TRINITY_DN861_c2_g1_i1.p1
OG0035335NODE_22694_length_855_cov_7.32625_g16559_i0.p1 TRINITY_DN88759_c0_g1_i1.p1
OG0035357NODE_20284_length_996_cov_17.0319_g14574_i0.p1 TRINITY_DN96805_c0_g1_i1.p1

Determine the orthogroups that contain V.tsujii significantly upregulated bioluminescent upper lip genes

In [162]:
vtsujii_bio_upper_lip_ids <- subset(vtsujii_bio_upper_lip, select = "transcript_id")
In [163]:
# make sure to change the column names

check_and_subset <- function(df1, df2) {
  matched_rows_list <- list()
  for (i in 1:nrow(df1)) {
    char_row <- df1[i, "transcript_id"]
    matched_rows <- df2[str_detect(df2$Vargula_tsujii_cdhit_95.fasta.transdecoder, paste0(char_row, collapse = "|")), , drop = FALSE]
    if (nrow(matched_rows) > 0) {
      matched_rows_list[[i]] <- matched_rows
    }
  }
  if (length(matched_rows_list) > 0) {
    matched_df <- bind_rows(matched_rows_list)
    return(matched_df)
  } else {
    return(NULL)
  }
}
In [164]:
# use the function 
vtsujii_bio_upper_lip_orthogroups <- check_and_subset(vtsujii_bio_upper_lip_ids, 
vargulatsujii_v_skogsbergia_orthologs_factor)
In [14]:
head(vtsujii_bio_upper_lip_orthogroups)
A data.frame: 6 × 3
OrthogroupVargula_tsujii_cdhit_95.fasta.transdecoderSkogsbergia_sp
<chr><chr><chr>
1OG0000094NODE_10049_length_2009_cov_1010.67_g4245_i1.p1, NODE_10354_length_1968_cov_1026.02_g4245_i3.p1, NODE_12136_length_1738_cov_717.411_g4245_i8.p1, NODE_17756_length_1175_cov_1962.42_g4245_i11.p1, NODE_5975_length_2694_cov_957.783_g4245_i0.p1, NODE_10130_length_1999_cov_0_g7141_i0.p1, NODE_41836_length_377_cov_2.5_g34623_i0.p1TRINITY_DN20093_c0_g2_i1.p1, TRINITY_DN46874_c0_g1_i1.p1, TRINITY_DN60232_c0_g1_i1.p1, TRINITY_DN81872_c0_g1_i1.p1, TRINITY_DN89509_c0_g1_i1.p1, TRINITY_DN77986_c0_g1_i1.p1, TRINITY_DN83069_c0_g1_i1.p1
2OG0000125NODE_18633_length_1110_cov_3.7346_g13293_i0.p1, NODE_10092_length_2003_cov_126.275_g7115_i0.p1, NODE_12296_length_1715_cov_4.03494_g8622_i0.p1 TRINITY_DN14481_c0_g1_i2.p1, TRINITY_DN6380_c0_g1_i1.p1, TRINITY_DN678_c2_g4_i1.p1, TRINITY_DN678_c1_g1_i6.p1, TRINITY_DN20187_c1_g1_i2.p1, TRINITY_DN24839_c0_g2_i1.p1
3OG0003648NODE_25333_length_738_cov_4.83016_g18846_i0.p1, NODE_102_length_8050_cov_2.21326_g66_i0.p1 TRINITY_DN13994_c0_g2_i1.p1, TRINITY_DN7576_c0_g1_i2.p2, TRINITY_DN349_c0_g1_i1.p1, TRINITY_DN5035_c0_g1_i4.p1, TRINITY_DN7576_c0_g1_i2.p1, TRINITY_DN7771_c0_g2_i8.p1
4OG0000094NODE_10049_length_2009_cov_1010.67_g4245_i1.p1, NODE_10354_length_1968_cov_1026.02_g4245_i3.p1, NODE_12136_length_1738_cov_717.411_g4245_i8.p1, NODE_17756_length_1175_cov_1962.42_g4245_i11.p1, NODE_5975_length_2694_cov_957.783_g4245_i0.p1, NODE_10130_length_1999_cov_0_g7141_i0.p1, NODE_41836_length_377_cov_2.5_g34623_i0.p1TRINITY_DN20093_c0_g2_i1.p1, TRINITY_DN46874_c0_g1_i1.p1, TRINITY_DN60232_c0_g1_i1.p1, TRINITY_DN81872_c0_g1_i1.p1, TRINITY_DN89509_c0_g1_i1.p1, TRINITY_DN77986_c0_g1_i1.p1, TRINITY_DN83069_c0_g1_i1.p1
5OG0000074NODE_7141_length_2474_cov_0.620091_g5066_i0.p1, NODE_10457_length_1956_cov_0.983167_g7363_i0.p1, NODE_10803_length_1909_cov_0.69795_g7363_i1.p1 TRINITY_DN10224_c0_g1_i3.p1, TRINITY_DN14693_c1_g1_i3.p1, TRINITY_DN7570_c1_g2_i1.p1, TRINITY_DN13885_c0_g1_i2.p1, TRINITY_DN14693_c0_g1_i1.p1, TRINITY_DN7598_c0_g1_i1.p1, TRINITY_DN52655_c0_g1_i1.p1, TRINITY_DN47414_c1_g1_i1.p1, TRINITY_DN67932_c0_g1_i1.p1, TRINITY_DN5934_c1_g2_i1.p1, TRINITY_DN7570_c1_g1_i1.p1, TRINITY_DN34522_c0_g1_i1.p1, TRINITY_DN25109_c3_g1_i1.p1, TRINITY_DN7598_c0_g2_i2.p1, TRINITY_DN14693_c0_g2_i1.p1, TRINITY_DN72619_c0_g1_i1.p1
6OG0005447NODE_10653_length_1930_cov_10.2245_g7492_i0.p1 TRINITY_DN1146_c0_g1_i1.p2, TRINITY_DN1146_c0_g1_i1.p1
In [166]:
#extract all the orthogroups 
vtsujii_bio_upper_lip_orthogroups_numbers <- subset(vtsujii_bio_upper_lip_orthogroups, select = "Orthogroup")
In [167]:
head(vtsujii_bio_upper_lip_orthogroups_numbers)
A data.frame: 6 × 1
Orthogroup
<chr>
1OG0000094
2OG0000125
3OG0003648
4OG0000094
5OG0000074
6OG0005447
In [168]:
vtsujii_bio_upper_lip_orthogroups_numbers_rmdup <- vtsujii_bio_upper_lip_orthogroups_numbers %>% distinct()
In [169]:
nrow(vtsujii_bio_upper_lip_orthogroups_numbers_rmdup)
122

Determine the orthogroups that have Skogsbergia sp. significantly upregulated upper lip genes

In [170]:
#significantly upregulated genes (uniquely expressed) in the upper lip from Section 4.4.4
head(unique_genes_upper_lip_info)
A tibble: 6 × 7
transcript_idbaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN100127_c0_g1_i1 3.6082282.7293120.93417222.8803013.972962e-033.335116e-02
TRINITY_DN10031_c3_g1_i1 16.5440371.8336040.36990024.9545387.250222e-073.064050e-05
TRINITY_DN100493_c0_g1_i1 4.2492533.5714581.14549133.0469272.311939e-032.223371e-02
TRINITY_DN10077_c0_g2_i5 22.1542941.2218580.44662482.7354706.229131e-034.641496e-02
TRINITY_DN10099_c0_g1_i3 28.6669732.9850020.91352073.2705001.073577e-031.242571e-02
TRINITY_DN10116_c0_g1_i1 17.0293545.5422071.03557575.1421492.716139e-071.331053e-05
In [171]:
colnames(unique_genes_upper_lip_info)[1] <- "transcript_id"
In [172]:
skogs_upper_lip_ids <- subset(unique_genes_upper_lip_info, select = "transcript_id")
In [173]:
# use this function and change the column names 

check_and_subset <- function(df1, df2) {
  matched_rows_list <- list()
  for (i in 1:nrow(df1)) {
    char_row <- df1[i, "transcript_id"]
    matched_rows <- df2[str_detect(df2$Skogsbergia_sp, paste0(char_row, collapse = "|")), , drop = FALSE]
    if (nrow(matched_rows) > 0) {
      matched_rows_list[[i]] <- matched_rows
    }
  }
  if (length(matched_rows_list) > 0) {
    matched_df <- bind_rows(matched_rows_list)
    return(matched_df)
  } else {
    return(NULL)
  }
}
In [174]:
# use the function 
skogs_upper_lip_orthogroups <- check_and_subset(skogs_upper_lip_ids, vargulatsujii_v_skogsbergia_orthologs_factor)
In [15]:
head(skogs_upper_lip_orthogroups)
A data.frame: 6 × 3
OrthogroupVargula_tsujii_cdhit_95.fasta.transdecoderSkogsbergia_sp
<chr><chr><chr>
1OG0010754NODE_36166_length_451_cov_1.40657_g29039_i0.p1 TRINITY_DN101713_c0_g1_i1.p1
2OG0001010NODE_6046_length_2683_cov_0.0684932_g4293_i0.p1TRINITY_DN10274_c1_g2_i1.p1
3OG0014778NODE_22421_length_870_cov_4.98528_g16330_i0.p1 TRINITY_DN11085_c0_g1_i2.p1
4OG0005125NODE_11298_length_1841_cov_8.13046_g7916_i0.p1 TRINITY_DN71250_c0_g1_i1.p2, TRINITY_DN11410_c0_g1_i1.p1
5OG0003955NODE_10505_length_1950_cov_293.099_g7393_i0.p1 TRINITY_DN1144_c0_g1_i1.p1, TRINITY_DN1144_c0_g2_i1.p1, TRINITY_DN858_c0_g1_i1.p1, TRINITY_DN3454_c1_g1_i1.p1, TRINITY_DN3579_c0_g1_i1.p1
6OG0000533NODE_506_length_5838_cov_8.43438_g347_i0.p1 TRINITY_DN1164_c0_g1_i2.p1, TRINITY_DN5061_c0_g1_i6.p1
In [176]:
skogs_upper_lip_orthogroups_numbers <- subset(skogs_upper_lip_orthogroups, select = "Orthogroup" )
In [177]:
skogs_upper_lip_orthogroups_numbers_rmdup <- skogs_upper_lip_orthogroups_numbers %>% distinct()
In [178]:
nrow(skogs_upper_lip_orthogroups_numbers_rmdup)
87

Determine the number of orthogroups with significantly upregulated genes shared between luminous upper lip and non-luminous upper lip.

In [16]:
head(vtsujii_bio_upper_lip_orthogroups_numbers_rmdup$Orthogroup)

head(skogs_upper_lip_orthogroups_numbers_rmdup$Orthogroup)
  1. 'OG0000094'
  2. 'OG0000125'
  3. 'OG0003648'
  4. 'OG0000074'
  5. 'OG0005447'
  6. 'OG0002524'
  1. 'OG0010754'
  2. 'OG0001010'
  3. 'OG0014778'
  4. 'OG0005125'
  5. 'OG0003955'
  6. 'OG0000533'
In [180]:
#convert columns to characters for the function below 

vtsujii_bio_upper_lip_orthogroups_numbers_rmdup_chr <- vtsujii_bio_upper_lip_orthogroups_numbers_rmdup %>% mutate(Orthogroup = as.character(Orthogroup))
In [181]:
skogs_upper_lip_orthogroups_numbers_rmdup_chr <- skogs_upper_lip_orthogroups_numbers_rmdup %>% mutate(Orthogroup = as.character(Orthogroup))
In [183]:
#start with the longer list first which is V.tsujii
vtsujii_skogs_upper_lips_orthogroups_numbers_match <- vtsujii_bio_upper_lip_orthogroups_numbers_rmdup_chr %>%
    mutate(match = c("no", "yes")[1 + (rowSums(
        outer(
            strsplit(Orthogroup, "\\s+"),
            strsplit(skogs_upper_lip_orthogroups_numbers_rmdup_chr$Orthogroup, "\\s+"),
            Vectorize(function(x, y) all(x %in% y) | all(y %in% x))
        )
    ) > 0)])
In [184]:
head(vtsujii_skogs_upper_lips_orthogroups_numbers_match)
A data.frame: 6 × 2
Orthogroupmatch
<chr><chr>
1OG0000094yes
2OG0000125no
3OG0003648no
4OG0000074no
5OG0005447no
6OG0002524yes
In [185]:
vtsujii_skogs_upper_lips_orthogroups_numbers_match_yes <- vtsujii_skogs_upper_lips_orthogroups_numbers_match %>% filter(match=="yes")
In [186]:
vtsujii_skogs_upper_lips_orthogroups_numbers_match_yes
A data.frame: 17 × 2
Orthogroupmatch
<chr><chr>
OG0000094yes
OG0002524yes
OG0002955yes
OG0000051yes
OG0001479yes
OG0000579yes
OG0012037yes
OG0001390yes
OG0001028yes
OG0007820yes
OG0002252yes
OG0000218yes
OG0001010yes
OG0001422yes
OG0000107yes
OG0001359yes
OG0011886yes
In [187]:
#take the Orthogroup numbers  
vtsujii_skogs_upper_lips_orthogroups_numbers_match_yes_ch <- as.character(vtsujii_skogs_upper_lips_orthogroups_numbers_match_yes$Orthogroup)
In [188]:
vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_UPPER_LIPS <-  vargulatsujii_v_skogsbergia_orthologs_factor %>% filter(Orthogroup %in% vtsujii_skogs_upper_lips_orthogroups_numbers_match_yes_ch)
In [17]:
head(vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_UPPER_LIPS)
A data.frame: 6 × 3
OrthogroupVargula_tsujii_cdhit_95.fasta.transdecoderSkogsbergia_sp
<chr><chr><chr>
1OG0000051NODE_2487_length_3782_cov_1.11215_g1734_i0.p1 TRINITY_DN42726_c0_g1_i1.p1
2OG0000051NODE_11982_length_1756_cov_29.0747_g8390_i0.p1 TRINITY_DN14579_c0_g1_i7.p1, TRINITY_DN72085_c0_g1_i1.p1, TRINITY_DN85620_c0_g1_i1.p1, TRINITY_DN8220_c0_g11_i1.p1, TRINITY_DN8220_c0_g3_i1.p2, TRINITY_DN8220_c0_g3_i1.p1, TRINITY_DN8220_c0_g8_i1.p1, TRINITY_DN5161_c0_g1_i1.p1, TRINITY_DN8220_c0_g2_i1.p1, TRINITY_DN33742_c0_g1_i1.p1, TRINITY_DN58301_c0_g1_i1.p1, TRINITY_DN8220_c0_g13_i1.p1, TRINITY_DN8220_c0_g7_i1.p1, TRINITY_DN6369_c0_g1_i1.p1, TRINITY_DN8220_c1_g1_i1.p1
3OG0000051NODE_11254_length_1847_cov_0.00837054_g7891_i0.p1, NODE_10966_length_1887_cov_3.9361_g7703_i0.p1 TRINITY_DN6265_c0_g1_i1.p1
4OG0000094NODE_5877_length_2713_cov_48.2983_g4177_i0.p1 TRINITY_DN16483_c0_g1_i1.p1
5OG0000094NODE_10049_length_2009_cov_1010.67_g4245_i1.p1, NODE_10354_length_1968_cov_1026.02_g4245_i3.p1, NODE_12136_length_1738_cov_717.411_g4245_i8.p1, NODE_17756_length_1175_cov_1962.42_g4245_i11.p1, NODE_5975_length_2694_cov_957.783_g4245_i0.p1, NODE_10130_length_1999_cov_0_g7141_i0.p1, NODE_41836_length_377_cov_2.5_g34623_i0.p1TRINITY_DN20093_c0_g2_i1.p1, TRINITY_DN46874_c0_g1_i1.p1, TRINITY_DN60232_c0_g1_i1.p1, TRINITY_DN81872_c0_g1_i1.p1, TRINITY_DN89509_c0_g1_i1.p1, TRINITY_DN77986_c0_g1_i1.p1, TRINITY_DN83069_c0_g1_i1.p1
6OG0000094NODE_9862_length_2035_cov_23.9697_g6956_i0.p1 TRINITY_DN6920_c0_g1_i1.p1

Extract all V.tsujii significantly upregulated bioluminescent upper lip genes from orthogroups shared across both upper lips

In [190]:
#now extract all transcripts 
vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_BIO_UPPER_LIPS_transcript_ids <- as.character(vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_UPPER_LIPS$Vargula_tsujii_cdhit_95.fasta.transdecoder)
In [191]:
vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_BIO_UPPER_LIPS_transcript_ids_unlist  <- unlist(strsplit(vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_BIO_UPPER_LIPS_transcript_ids,","))
In [192]:
#remove p* 
vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_BIO_UPPER_LIPS_transcript_ids_unlist_removep <- vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_BIO_UPPER_LIPS_transcript_ids_unlist %>% str_replace("(.p1)", "") %>% 
str_replace("(.p2)", "")
In [194]:
#remove extra space in front of transcript ids

vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_BIO_UPPER_LIPS_transcript_ids_unlist_removep_trimws <- trimws(vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_BIO_UPPER_LIPS_transcript_ids_unlist_removep)
In [195]:
vtsujii_bio_upper_lip_orthogroups_transcript_ids <- vtsujii_bio_upper_lip %>% filter (transcript_id %in% vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_BIO_UPPER_LIPS_transcript_ids_unlist_removep_trimws)
In [18]:
head(vtsujii_bio_upper_lip_orthogroups_transcript_ids)
A data.frame: 6 × 22
transcript_idbaseMeanlog2FoldChangelfcSEstatpvaluepadjX.gene_idsprot_Top_BLASTX_hitRNAMMERsprot_Top_BLASTP_hitPfamSignalPTmHMMeggnogKegggene_ontology_blastgene_ontology_pfamtranscriptpeptide
<chr><dbl><dbl><dbl><dbl><dbl><dbl><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr>
1NODE_10049_length_2009_cov_1010.67_g4245_i1 144.9421193.7802060.80275684.7029332.564508e-066.123575e-05g4245ATS16_MOUSE^ATS16_MOUSE^Q:1751-417,H:93-572^25.662%ID^E:3.7e-36^RecName: Full=A disintegrin and metalloproteinase with thrombospondin motifs 16;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus .ADT1_CAEEL^ADT1_CAEEL^Q:53-403,H:141-533^27.114%ID^E:4.13e-35^RecName: Full=A disintegrin and metalloproteinase with thrombospondin motifs adt-1 {ECO:0000305};^Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida; Rhabditina; Rhabditomorpha; Rhabditoidea; Rhabditidae; Peloderinae; Caenorhabditis PF13688.6^Reprolysin_5^Metallo-peptidase family M12^131-300^E:5.7e-12`PF01421.19^Reprolysin^Reprolysin (M12B) family zinc metalloprotease^135-323^E:1.9e-15`PF13582.6^Reprolysin_3^Metallo-peptidase family M12B Reprolysin-like^142-274^E:4.1e-09`PF13583.6^Reprolysin_4^Metallo-peptidase family M12B Reprolysin-like^200-286^E:3.1e-06`PF13574.6^Reprolysin_2^Metallo-peptidase family M12B Reprolysin-like^219-311^E:3.9e-08`PF17771.1^ADAM_CR_2^ADAM cysteine-rich domain^339-403^E:1.2e-09`PF17771.1^ADAM_CR_2^ADAM cysteine-rich domain^430-498^E:5.6e-05. . ENOG41104P0^Thrombospondin type 1 domain KEGG:cel:CELE_C02B4.1 GO:0005576^cellular_component^extracellular region`GO:0046872^molecular_function^metal ion binding`GO:0004222^molecular_function^metalloendopeptidase activity GO:0004222^molecular_function^metalloendopeptidase activity`GO:0006508^biological_process^proteolysis ..
2NODE_10354_length_1968_cov_1026.02_g4245_i3 135.0139523.7634350.83991614.4741397.671972e-061.652155e-04g4245ADT1_CAEEL^ADT1_CAEEL^Q:1331-417,H:222-533^29.595%ID^E:1.59e-33^RecName: Full=A disintegrin and metalloproteinase with thrombospondin motifs adt-1 {ECO:0000305};^Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida; Rhabditina; Rhabditomorpha; Rhabditoidea; Rhabditidae; Peloderinae; Caenorhabditis .ADT1_CAEEL^ADT1_CAEEL^Q:4-308,H:222-533^29.595%ID^E:9.2e-36^RecName: Full=A disintegrin and metalloproteinase with thrombospondin motifs adt-1 {ECO:0000305};^Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida; Rhabditina; Rhabditomorpha; Rhabditoidea; Rhabditidae; Peloderinae; Caenorhabditis PF13688.6^Reprolysin_5^Metallo-peptidase family M12^36-205^E:3.5e-12`PF01421.19^Reprolysin^Reprolysin (M12B) family zinc metalloprotease^40-228^E:1.2e-15`PF13582.6^Reprolysin_3^Metallo-peptidase family M12B Reprolysin-like^47-179^E:2.7e-09`PF13583.6^Reprolysin_4^Metallo-peptidase family M12B Reprolysin-like^105-191^E:2e-06`PF13574.6^Reprolysin_2^Metallo-peptidase family M12B Reprolysin-like^124-216^E:2.6e-08`PF17771.1^ADAM_CR_2^ADAM cysteine-rich domain^244-308^E:8.7e-10`PF17771.1^ADAM_CR_2^ADAM cysteine-rich domain^335-403^E:4.1e-05 . . ENOG41104P0^Thrombospondin type 1 domain KEGG:cel:CELE_C02B4.1 GO:0005576^cellular_component^extracellular region`GO:0046872^molecular_function^metal ion binding`GO:0004222^molecular_function^metalloendopeptidase activity GO:0004222^molecular_function^metalloendopeptidase activity`GO:0006508^biological_process^proteolysis ..
3NODE_10887_length_1898_cov_63.2333_g6458_i2 5.1912545.5676291.07873904.9248048.444507e-072.286926e-05g6458. .. . sigP:1^19^0.889^YES. . . . . ..
4NODE_11282_length_1844_cov_0.340973_g7910_i0 3.0876534.5839591.19561963.7127172.050462e-043.148622e-03g7910ACH10_RAT^ACH10_RAT^Q:125-1204,H:32-389^28.689%ID^E:2.31e-36^RecName: Full=Neuronal acetylcholine receptor subunit alpha-10;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Rattus .ACH10_RAT^ACH10_RAT^Q:26-385,H:32-389^28.689%ID^E:1.38e-42^RecName: Full=Neuronal acetylcholine receptor subunit alpha-10;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Rattus PF02931.23^Neur_chan_LBD^Neurotransmitter-gated ion-channel ligand binding domain^26-236^E:5.3e-43 sigP:1^20^0.836^YESExpAA=93.16^PredHel=4^Topology=o241-263i270-289o299-321i389-411oENOG410XQGR^cholinergic receptor, nicotinicKEGG:rno:64574`KO:K04811 GO:0030424^cellular_component^axon`GO:0030054^cellular_component^cell junction`GO:0098981^cellular_component^cholinergic synapse`GO:0005887^cellular_component^integral component of plasma membrane`GO:0099060^cellular_component^integral component of postsynaptic specialization membrane`GO:0043005^cellular_component^neuron projection`GO:0043204^cellular_component^perikaryon`GO:0045202^cellular_component^synapse`GO:0022848^molecular_function^acetylcholine-gated cation-selective channel activity`GO:0005262^molecular_function^calcium channel activity`GO:0004888^molecular_function^transmembrane signaling receptor activity`GO:1904315^molecular_function^transmitter-gated ion channel activity involved in regulation of postsynaptic membrane potential`GO:0007268^biological_process^chemical synaptic transmission`GO:0050910^biological_process^detection of mechanical stimulus involved in sensory perception of sound`GO:0042472^biological_process^inner ear morphogenesis`GO:0034220^biological_process^ion transmembrane transport`GO:0070373^biological_process^negative regulation of ERK1 and ERK2 cascade`GO:0050877^biological_process^nervous system process`GO:0007204^biological_process^positive regulation of cytosolic calcium ion concentration`GO:0042391^biological_process^regulation of membrane potential`GO:0007165^biological_process^signal transduction`GO:0007271^biological_process^synaptic transmission, cholinergicGO:0005230^molecular_function^extracellular ligand-gated ion channel activity`GO:0006811^biological_process^ion transport`GO:0016021^cellular_component^integral component of membrane ..
5NODE_11982_length_1756_cov_29.0747_g8390_i0 123.7553364.2622101.23046273.4575745.450631e-047.303192e-03g8390P4HA2_CHICK^P4HA2_CHICK^Q:33-1589,H:1-529^30.292%ID^E:1.86e-67^RecName: Full=Prolyl 4-hydroxylase subunit alpha-2;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus .P4HA2_CHICK^P4HA2_CHICK^Q:1-519,H:1-529^30.474%ID^E:4.96e-71^RecName: Full=Prolyl 4-hydroxylase subunit alpha-2;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; GallusPF08336.11^P4Ha_N^Prolyl 4-Hydroxylase alpha-subunit, N-terminal region^24-150^E:2.1e-21`PF13640.6^2OG-FeII_Oxy_3^2OG-Fe(II) oxygenase superfamily^403-508^E:7.1e-14 sigP:1^20^0.682^YES. ENOG410XS5J^prolyl 4-hydroxylase KEGG:gga:416326`KO:K00472GO:0005783^cellular_component^endoplasmic reticulum`GO:0005788^cellular_component^endoplasmic reticulum lumen`GO:0005506^molecular_function^iron ion binding`GO:0031418^molecular_function^L-ascorbic acid binding`GO:0016702^molecular_function^oxidoreductase activity, acting on single donors with incorporation of molecular oxygen, incorporation of two atoms of oxygen`GO:0004656^molecular_function^procollagen-proline 4-dioxygenase activity`GO:0018401^biological_process^peptidyl-proline hydroxylation to 4-hydroxy-L-proline GO:0004656^molecular_function^procollagen-proline 4-dioxygenase activity`GO:0016702^molecular_function^oxidoreductase activity, acting on single donors with incorporation of molecular oxygen, incorporation of two atoms of oxygen`GO:0055114^biological_process^oxidation-reduction process`GO:0005783^cellular_component^endoplasmic reticulum`GO:0016491^molecular_function^oxidoreductase activity..
6NODE_12136_length_1738_cov_717.411_g4245_i8 77.9554983.3366740.88115353.7788881.575305e-042.484441e-03g4245ADT1_CAEEL^ADT1_CAEEL^Q:1042-1227,H:341-396^37.097%ID^E:7.86e-11^RecName: Full=A disintegrin and metalloproteinase with thrombospondin motifs adt-1 {ECO:0000305};^Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida; Rhabditina; Rhabditomorpha; Rhabditoidea; Rhabditidae; Peloderinae; Caenorhabditis`ADT1_CAEEL^ADT1_CAEEL^Q:1239-1631,H:403-533^30.282%ID^E:7.86e-11^RecName: Full=A disintegrin and metalloproteinase with thrombospondin motifs adt-1 {ECO:0000305};^Eukaryota; Metazoa; Ecdysozoa; Nematoda; Chromadorea; Rhabditida; Rhabditina; Rhabditomorpha; Rhabditoidea; Rhabditidae; Peloderinae; Caenorhabditis.. . . . . . . . ..
In [197]:
nrow(vtsujii_bio_upper_lip_orthogroups_transcript_ids)
24
In [198]:
#write.csv(vtsujii_bio_upper_lip_orthogroups_transcript_ids, file = "vtsujii_bio_upper_lip_orthogroups_transcript_ids.csv")
In [285]:
shared_bio_upper_lip_transcript_ids <- vtsujii_bio_upper_lip_orthogroups_transcript_ids$transcript_id
In [286]:
shared_bio_upper_lip_transcript_ids
  1. 'NODE_10049_length_2009_cov_1010.67_g4245_i1'
  2. 'NODE_10354_length_1968_cov_1026.02_g4245_i3'
  3. 'NODE_10887_length_1898_cov_63.2333_g6458_i2'
  4. 'NODE_11282_length_1844_cov_0.340973_g7910_i0'
  5. 'NODE_11982_length_1756_cov_29.0747_g8390_i0'
  6. 'NODE_12136_length_1738_cov_717.411_g4245_i8'
  7. 'NODE_1240_length_4636_cov_3.35691_g868_i0'
  8. 'NODE_13699_length_1559_cov_423.459_g9597_i0'
  9. 'NODE_1377_length_4491_cov_2.18012_g955_i0'
  10. 'NODE_17313_length_1212_cov_5.50994_g6842_i1'
  11. 'NODE_17756_length_1175_cov_1962.42_g4245_i11'
  12. 'NODE_21998_length_891_cov_153.951_g15967_i0'
  13. 'NODE_24529_length_771_cov_267.937_g18141_i0'
  14. 'NODE_2636_length_3709_cov_82.7176_g1142_i1'
  15. 'NODE_39804_length_400_cov_2.29565_g32617_i0'
  16. 'NODE_45584_length_342_cov_2.35192_g38364_i0'
  17. 'NODE_5877_length_2713_cov_48.2983_g4177_i0'
  18. 'NODE_5975_length_2694_cov_957.783_g4245_i0'
  19. 'NODE_6046_length_2683_cov_0.0684932_g4293_i0'
  20. 'NODE_6278_length_2642_cov_0.189795_g4467_i0'
  21. 'NODE_8529_length_2238_cov_82.9111_g5617_i1'
  22. 'NODE_9862_length_2035_cov_23.9697_g6956_i0'
  23. 'NODE_3114_length_3481_cov_47.0756_g2183_i0'
  24. 'NODE_36069_length_452_cov_5.25693_g28946_i0'

Extract all Skogsbergia sp. significantly upregulated bioluminescent upper lip genes from orthogroups shared across both upper lips

In [199]:
#now extract all transcripts 

vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_skogs_UPPER_LIP_transcript_ids <- as.character(vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_UPPER_LIPS$Skogsbergia_sp)
In [200]:
vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_skogs_UPPER_LIP_transcript_ids_unlist  <- unlist(strsplit(vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_skogs_UPPER_LIP_transcript_ids,","))
In [201]:
#remove p* 
vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_skogs_UPPER_LIP_transcript_ids_unlist_removep <- vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_skogs_UPPER_LIP_transcript_ids_unlist %>% str_replace("(.p1)", "") %>% 
str_replace("(.p2)", "")
In [202]:
#remove white spaces 

vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_skogs_UPPER_LIP_transcript_ids_unlist_removep_trimws <- trimws(vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_skogs_UPPER_LIP_transcript_ids_unlist_removep)
In [203]:
skogs_upper_lip_orthogroups_transcript_ids <- unique_genes_upper_lip_info_annot %>% filter (transcript_id %in% vargulatsujii_v_skogsbergia_SHARED_ORTHOGROUPS_skogs_UPPER_LIP_transcript_ids_unlist_removep_trimws)
In [19]:
head(skogs_upper_lip_orthogroups_transcript_ids)
A tibble: 6 × 23
transcript_idbaseMeanlog2FoldChangelfcSEstatpvaluepadj#gene_idsprot_Top_BLASTX_hitRNAMMERPfamSignalPTmHMMeggnogKegggene_ontology_BLASTXgene_ontology_BLASTPgene_ontology_Pfamtranscriptpeptide
<chr><dbl><dbl><dbl><dbl><dbl><dbl><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr>
TRINITY_DN10274_c1_g2_i1 13.9893223.4427020.76899874.4538248.435425e-062.365771e-04TRINITY_DN10274_c1_g2CDKL1_DANRE^CDKL1_DANRE^Q:126-1193,H:1-350^61.345%ID^E:1.14e-140^RecName: Full=Cyclin-dependent kinase-like 1 {ECO:0000250|UniProtKB:Q00532};^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Actinopterygii; Neopterygii; Teleostei; Ostariophysi; Cypriniformes; Danionidae; Danioninae; Danio .PF00069.28^Pkinase^Protein kinase domain^4-287^E:1.9e-81`PF07714.20^PK_Tyr_Ser-Thr^Protein tyrosine and serine/threonine kinase^7-214^E:7.7e-35 . ..KEGG:dre:445316 GO:0005737^cellular_component^cytoplasm`GO:0005634^cellular_component^nucleus`GO:0005524^molecular_function^ATP binding`GO:0004693^molecular_function^cyclin-dependent protein serine/threonine kinase activity`GO:0004672^molecular_function^protein kinase activity`GO:0106310^molecular_function^protein serine kinase activity`GO:0004674^molecular_function^protein serine/threonine kinase activity`GO:0004712^molecular_function^protein serine/threonine/tyrosine kinase activity`GO:0006468^biological_process^protein phosphorylation GO:0005737^cellular_component^cytoplasm`GO:0005634^cellular_component^nucleus`GO:0005524^molecular_function^ATP binding`GO:0004693^molecular_function^cyclin-dependent protein serine/threonine kinase activity`GO:0004672^molecular_function^protein kinase activity`GO:0106310^molecular_function^protein serine kinase activity`GO:0004674^molecular_function^protein serine/threonine kinase activity`GO:0004712^molecular_function^protein serine/threonine/tyrosine kinase activity`GO:0006468^biological_process^protein phosphorylation GO:0004672^molecular_function^protein kinase activity`GO:0005524^molecular_function^ATP binding`GO:0006468^biological_process^protein phosphorylation ..
TRINITY_DN12281_c0_g1_i2 9.3258293.0163640.85595603.5058994.550687e-046.255200e-03TRINITY_DN12281_c0_g1ACH10_CHICK^ACH10_CHICK^Q:1226-408,H:34-304^31.769%ID^E:3.61e-31^RecName: Full=Neuronal acetylcholine receptor subunit alpha-10;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archelosauria; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves; Neognathae; Galloanserae; Galliformes; Phasianidae; Phasianinae; Gallus .PF02931.26^Neur_chan_LBD^Neurotransmitter-gated ion-channel ligand binding domain^37-242^E:1.4e-51 sigP:1^18^0.896..KEGG:gga:430628 GO:0005887^cellular_component^integral component of plasma membrane`GO:0043005^cellular_component^neuron projection`GO:0045211^cellular_component^postsynaptic membrane`GO:0045202^cellular_component^synapse`GO:0022848^molecular_function^acetylcholine-gated cation-selective channel activity`GO:0005262^molecular_function^calcium channel activity`GO:0030594^molecular_function^neurotransmitter receptor activity`GO:0004888^molecular_function^transmembrane signaling receptor activity`GO:0007268^biological_process^chemical synaptic transmission`GO:0034220^biological_process^ion transmembrane transport`GO:0051899^biological_process^membrane depolarization`GO:0050877^biological_process^nervous system process`GO:0042391^biological_process^regulation of membrane potential`GO:0007165^biological_process^signal transductionGO:0005887^cellular_component^integral component of plasma membrane`GO:0043005^cellular_component^neuron projection`GO:0045211^cellular_component^postsynaptic membrane`GO:0045202^cellular_component^synapse`GO:0022848^molecular_function^acetylcholine-gated cation-selective channel activity`GO:0005262^molecular_function^calcium channel activity`GO:0030594^molecular_function^neurotransmitter receptor activity`GO:0004888^molecular_function^transmembrane signaling receptor activity`GO:0007268^biological_process^chemical synaptic transmission`GO:0034220^biological_process^ion transmembrane transport`GO:0051899^biological_process^membrane depolarization`GO:0050877^biological_process^nervous system process`GO:0042391^biological_process^regulation of membrane potential`GO:0007165^biological_process^signal transductionGO:0005230^molecular_function^extracellular ligand-gated ion channel activity`GO:0006811^biological_process^ion transport`GO:0016021^cellular_component^integral component of membrane..
TRINITY_DN1254_c0_g1_i1 8.6962782.8701880.89462523.1877601.433795e-031.562799e-02TRINITY_DN1254_c0_g1 PLMN_PONAB^PLMN_PONAB^Q:476-234,H:108-181^43.21%ID^E:9.08e-11^RecName: Full=Plasminogen;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Pongo`PLMN_PONAB^PLMN_PONAB^Q:509-234,H:183-262^37.634%ID^E:2.44e-06^RecName: Full=Plasminogen;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Pongo`PLMN_PONAB^PLMN_PONAB^Q:476-234,H:280-352^37.805%ID^E:3.97e-06^RecName: Full=Plasminogen;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Pongo.PF07648.18^Kazal_2^Kazal-type serine protease inhibitor domain^33-76^E:0.00012`PF00050.24^Kazal_1^Kazal-type serine protease inhibitor domain^35-69^E:0.00013`PF01549.27^ShK^ShK domain-like^44-59^E:14000`PF00051.21^Kringle^Kringle domain^69-151^E:6.1e-17`PF01549.27^ShK^ShK domain-like^86-93^E:3300`PF01549.27^ShK^ShK domain-like^119-125^E:4500`PF01549.27^ShK^ShK domain-like^168-205^E:2.8e-05sigP:1^15^0.848..KEGG:pon:100172984GO:0005576^cellular_component^extracellular region`GO:0004252^molecular_function^serine-type endopeptidase activity`GO:0007596^biological_process^blood coagulation`GO:0042730^biological_process^fibrinolysis`GO:0048771^biological_process^tissue remodeling GO:0005576^cellular_component^extracellular region`GO:0004252^molecular_function^serine-type endopeptidase activity`GO:0007596^biological_process^blood coagulation`GO:0042730^biological_process^fibrinolysis`GO:0048771^biological_process^tissue remodeling GO:0005515^molecular_function^protein binding ..
TRINITY_DN1272_c0_g1_i1 192.7163264.3140830.84603145.0986013.421731e-071.629169e-05TRINITY_DN1272_c0_g1 . .. . ... . . . ..
TRINITY_DN1272_c0_g3_i1 106.7713604.1944321.06675923.9389138.185164e-051.566146e-03TRINITY_DN1272_c0_g3 . .. sigP:1^22^0.871... . . . ..
TRINITY_DN1272_c0_g4_i1 9.6294193.5224480.99258773.5203744.309393e-045.997347e-03TRINITY_DN1272_c0_g4 . .. sigP:1^27^0.788... . . . ..
In [421]:
#write.csv(skogs_upper_lip_orthogroups_transcript_ids, file = "skogs_upper_lip_orthogroups_transcript_ids.csv")
In [205]:
nrow(skogs_upper_lip_orthogroups_transcript_ids)
22
In [206]:
shared_skogs_transcript_ids <- skogs_upper_lip_orthogroups_transcript_ids$transcript_id
In [207]:
shared_skogs_transcript_ids
  1. 'TRINITY_DN10274_c1_g2_i1'
  2. 'TRINITY_DN12281_c0_g1_i2'
  3. 'TRINITY_DN1254_c0_g1_i1'
  4. 'TRINITY_DN1272_c0_g1_i1'
  5. 'TRINITY_DN1272_c0_g3_i1'
  6. 'TRINITY_DN1272_c0_g4_i1'
  7. 'TRINITY_DN1331_c0_g1_i8'
  8. 'TRINITY_DN14523_c0_g1_i2'
  9. 'TRINITY_DN14782_c0_g1_i2'
  10. 'TRINITY_DN15549_c0_g1_i1'
  11. 'TRINITY_DN16483_c0_g1_i1'
  12. 'TRINITY_DN2267_c0_g1_i1'
  13. 'TRINITY_DN29879_c0_g1_i1'
  14. 'TRINITY_DN40233_c0_g1_i1'
  15. 'TRINITY_DN4065_c0_g1_i6'
  16. 'TRINITY_DN42551_c0_g1_i1'
  17. 'TRINITY_DN609_c0_g1_i1'
  18. 'TRINITY_DN7748_c0_g1_i4'
  19. 'TRINITY_DN8220_c0_g3_i1'
  20. 'TRINITY_DN8220_c0_g7_i1'
  21. 'TRINITY_DN8293_c1_g1_i1'
  22. 'TRINITY_DN68_c0_g2_i2'

DGE - GO enrichment analyses for upper lip

Use topGO to identify enriched biological processes in the upper lip

In [208]:
#import the trinotate go sheet from Trinotate output
geneID2GO <- readMappings(file ="go_annotations_cdhit90_longestisoform.txt")
In [209]:
#save the transcript ids of all the annotated genes under geneNames object 
geneNames<- as.character(Trinotate_lym_subset_skogs$transcript_id)
In [211]:
#significantly upregulated genes (i.e. expressed uniquely) in the upper lip 
head(unique_genes_upper_lip_info)
A tibble: 6 × 7
transcript_idbaseMeanlog2FoldChangelfcSEstatpvaluepadj
<chr><dbl><dbl><dbl><dbl><dbl><dbl>
TRINITY_DN100127_c0_g1_i1 3.6082282.7293120.93417222.8803013.972962e-033.335116e-02
TRINITY_DN10031_c3_g1_i1 16.5440371.8336040.36990024.9545387.250222e-073.064050e-05
TRINITY_DN100493_c0_g1_i1 4.2492533.5714581.14549133.0469272.311939e-032.223371e-02
TRINITY_DN10077_c0_g2_i5 22.1542941.2218580.44662482.7354706.229131e-034.641496e-02
TRINITY_DN10099_c0_g1_i3 28.6669732.9850020.91352073.2705001.073577e-031.242571e-02
TRINITY_DN10116_c0_g1_i1 17.0293545.5422071.03557575.1421492.716139e-071.331053e-05
In [212]:
#save the transcript 
myInterestingGenes= as.character(unique_genes_upper_lip_info$transcript_id)
In [214]:
#subset the genesNames by the transcript IDs in my red module 
geneList <- factor(as.integer(geneNames %in% myInterestingGenes))
names(geneList) <- geneNames
head(geneList)
TRINITY_DN0_c0_g1_i2
0
TRINITY_DN0_c0_g3_i1
0
TRINITY_DN0_c0_g4_i2
0
TRINITY_DN100000_c0_g1_i1
0
TRINITY_DN100001_c0_g1_i1
0
TRINITY_DN100002_c0_g1_i1
0
Levels:
  1. '0'
  2. '1'
In [215]:
#run the topGO function for BP 
GOdata <- new("topGOdata", ontology = "BP", allGenes = geneList,
                    annot = annFUN.gene2GO, gene2GO = geneID2GO)
Building most specific GOs .....

	( 15239 GO terms found. )


Build GO DAG topology ..........

	( 15990 GO terms and 37503 relations. )


Annotating nodes ...............

	( 25887 genes annotated to the GO terms. )

In [216]:
results_go <- runTest(GOdata, algorithm="weight01", statistic="Fisher")
			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 2274 nontrivial nodes
		 parameters: 
			 test statistic: fisher


	 Level 19:	1 nodes to be scored	(0 eliminated genes)


	 Level 18:	1 nodes to be scored	(0 eliminated genes)


	 Level 17:	1 nodes to be scored	(2 eliminated genes)


	 Level 16:	3 nodes to be scored	(13 eliminated genes)


	 Level 15:	12 nodes to be scored	(25 eliminated genes)


	 Level 14:	22 nodes to be scored	(133 eliminated genes)


	 Level 13:	41 nodes to be scored	(497 eliminated genes)


	 Level 12:	75 nodes to be scored	(1685 eliminated genes)


	 Level 11:	119 nodes to be scored	(4443 eliminated genes)


	 Level 10:	178 nodes to be scored	(7832 eliminated genes)


	 Level 9:	249 nodes to be scored	(10815 eliminated genes)


	 Level 8:	290 nodes to be scored	(13231 eliminated genes)


	 Level 7:	358 nodes to be scored	(15895 eliminated genes)


	 Level 6:	362 nodes to be scored	(20011 eliminated genes)


	 Level 5:	292 nodes to be scored	(22509 eliminated genes)


	 Level 4:	166 nodes to be scored	(24221 eliminated genes)


	 Level 3:	84 nodes to be scored	(25132 eliminated genes)


	 Level 2:	19 nodes to be scored	(25518 eliminated genes)


	 Level 1:	1 nodes to be scored	(25827 eliminated genes)

In [217]:
#retrieve the GO enrichment 
goEnrichment   <- GenTable(GOdata, Fisher = results_go, orderBy = "Fisher", topNodes = 10000, numChar=1000)
In [265]:
goEnrichment$Fisher <- as.numeric(goEnrichment$Fisher)
goEnrichment <- goEnrichment[goEnrichment$Fisher < 0.05,] 
goEnrichment <- goEnrichment[goEnrichment$Significant > 1,] 
goEnrichment <- goEnrichment[,c("GO.ID","Term", "Annotated", "Significant", "Expected", "Fisher")]
goEnrichment
A data.frame: 50 × 6
GO.IDTermAnnotatedSignificantExpectedFisher
<chr><chr><int><int><dbl><dbl>
1GO:0018401peptidyl-proline hydroxylation to 4-hydroxy-L-proline 111 6 0.490.000010
2GO:0006508proteolysis 24822011.030.000012
3GO:0034220ion transmembrane transport 1400 7 6.220.000270
4GO:0048047mating behavior, sex discrimination 6 2 0.030.000290
5GO:0007613memory 268 6 1.190.000330
6GO:0015939pantothenate metabolic process 10 2 0.040.000860
7GO:0042532negative regulation of tyrosine phosphorylation of STAT protein 10 2 0.040.000860
8GO:0048149behavioral response to ethanol 59 3 0.260.002320
9GO:0007421stomatogastric nervous system development 19 2 0.080.003180
10GO:1900005positive regulation of serine-type endopeptidase activity 20 2 0.090.003530
11GO:0050819negative regulation of coagulation 75 4 0.330.003770
12GO:0038004epidermal growth factor receptor ligand maturation 21 2 0.090.003890
13GO:0045752positive regulation of Toll signaling pathway 21 2 0.090.003890
14GO:0044719regulation of imaginal disc-derived wing size 22 2 0.100.004260
20GO:0043703photoreceptor cell fate determination 24 2 0.110.005070
21GO:0035225determination of genital disc primordium 25 2 0.110.005490
22GO:0048263determination of dorsal identity 27 2 0.120.006390
23GO:0006629lipid metabolic process 213212 9.470.006630
24GO:1900242regulation of synaptic vesicle endocytosis 28 2 0.120.006860
25GO:0050877nervous system process 171119 7.600.007360
26GO:0007584response to nutrient 250 6 1.110.007840
34GO:0061331epithelial cell proliferation involved in Malpighian tubule morphogenesis 33 2 0.150.009440
35GO:0030030cell projection organization 25771311.450.009490
36GO:0006641triglyceride metabolic process 155 4 0.690.010200
37GO:0052547regulation of peptidase activity 369 6 1.640.012880
38GO:0007605sensory perception of sound 314 5 1.390.013220
47GO:0032504multicellular organism reproduction 169713 7.540.014690
48GO:0042749regulation of circadian sleep/wake cycle 42 2 0.190.015000
49GO:0009880embryonic pattern specification 209 4 0.930.015360
50GO:0046667compound eye retinal cell programmed cell death 43 2 0.190.015690
51GO:0031397negative regulation of protein ubiquitination 119 3 0.530.016110
52GO:0007268chemical synaptic transmission 101510 4.510.017330
60GO:0051899membrane depolarization 124 3 0.550.017960
61GO:0050714positive regulation of protein secretion 228 4 1.010.018490
62GO:0015074DNA integration 228 4 1.010.018960
63GO:0031116positive regulation of microtubule polymerization 49 2 0.220.020080
64GO:0046843dorsal appendage formation 49 2 0.220.020080
65GO:0030198extracellular matrix organization 486 7 2.160.021200
66GO:0008049male courtship behavior 51 2 0.230.021650
73GO:0007474imaginal disc-derived wing vein specification 54 2 0.240.024090
86GO:0030178negative regulation of Wnt signaling pathway 199 3 0.880.026480
87GO:0048865stem cell fate commitment 57 2 0.250.026640
88GO:2000369regulation of clathrin-dependent endocytosis 59 2 0.260.028400
97GO:0016318ommatidial rotation 63 2 0.280.032040
98GO:0030193regulation of blood coagulation 86 3 0.380.034500
106GO:0009950dorsal/ventral axis specification 67 2 0.300.035860
110GO:0051384response to glucocorticoid 170 3 0.760.040280
111GO:0052548regulation of endopeptidase activity 349 5 1.550.042620
113GO:0072359circulatory system development 1515 4 6.730.045010
114GO:0018149peptide cross-linking 76 2 0.340.045060
In [273]:
#write.table(goEnrichment, "df_TopGO_Skogsbergia_sp_DE_unique_Upper_Lip_BP.tsv",sep = "\t", quote=FALSE)
In [267]:
myterms =goEnrichment$GO.ID 
mygenes = genesInTerm(GOdata, myterms)
In [ ]:
#extract the transcript ids for each GO term
var=c()
for (i in 1:length(myterms))
{
   myterm <- myterms[i]
   mygenesforterm <- mygenes[myterm][[1]]
   myfactor <- mygenesforterm %in% myInterestingGenes
   mygenesforterm2 <- mygenesforterm[myfactor == TRUE]
   mygenesforterm2 <- paste(mygenesforterm2, collapse=',')
   var[i]=paste(myterm,"genes:",mygenesforterm2)
}
In [246]:
# GO enrichment Upper Lip - BP 

ntop = 50
ggdata <- goEnrichment[1:ntop,]
ggdata$Term <- factor(ggdata$Term, levels = rev(ggdata$Term)) 
plot_GO_UL_BP <- ggplot(ggdata,
  aes(x = Term, y = -log10(Fisher), size = Significant, fill = -log10(Fisher))) +

  expand_limits(y = 1) +
  geom_point(shape = 21) +
  scale_size(range = c(2,7)) +
  scale_fill_continuous(low = 'royalblue', high = 'red4') +

  labs(
    title = 'GO Analysis - Biological Process')+
   theme_bw(base_size = 24) +
labs(size= "Number of Genes")+
  theme(
    legend.position = 'right',
    legend.background = element_rect(),
    plot.title = element_text(angle = 0, size = 16, face = 'bold', vjust = 1),
    plot.subtitle = element_text(angle = 0, size = 10, face = 'bold', vjust = 1),
    plot.caption = element_text(angle = 0, size = 12, face = 'bold', vjust = 1),

   axis.text.x = element_blank(),
    axis.text.y = element_text(angle = 0, size = 13, face = 'bold', vjust = 0.5),
    axis.title.x = element_blank(),
    axis.title.y = element_text(size = 8, face = 'bold'),
    axis.line = element_line(colour = 'black'),

    legend.key = element_blank(),
    legend.text = element_text(size = 14, face = "bold"), 
    title = element_text(size = 14, face = "bold")) +


  coord_flip()

#dev.off()
In [249]:
plot_GO_UL_BP + labs(x = NULL)
In [248]:
#library (repr)
options(repr.plot.width=12, repr.plot.height=8, repr.plot.res = 500)

Use topGO to identify enriched molecular functions in the upper lip

In [250]:
#run the topGO function 
GOdata_MF <- new("topGOdata", ontology = "MF", allGenes = geneList,
                    annot = annFUN.gene2GO, gene2GO = geneID2GO)
Building most specific GOs .....

	( 4594 GO terms found. )


Build GO DAG topology ..........

	( 4632 GO terms and 6032 relations. )


Annotating nodes ...............

	( 25806 genes annotated to the GO terms. )

In [251]:
results_go_MF <- runTest(GOdata_MF, algorithm="weight01", statistic="Fisher")
			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 397 nontrivial nodes
		 parameters: 
			 test statistic: fisher


	 Level 12:	3 nodes to be scored	(0 eliminated genes)


	 Level 11:	4 nodes to be scored	(0 eliminated genes)


	 Level 10:	5 nodes to be scored	(45 eliminated genes)


	 Level 9:	15 nodes to be scored	(260 eliminated genes)


	 Level 8:	23 nodes to be scored	(2465 eliminated genes)


	 Level 7:	57 nodes to be scored	(6712 eliminated genes)


	 Level 6:	71 nodes to be scored	(7530 eliminated genes)


	 Level 5:	88 nodes to be scored	(10807 eliminated genes)


	 Level 4:	81 nodes to be scored	(14917 eliminated genes)


	 Level 3:	39 nodes to be scored	(21292 eliminated genes)


	 Level 2:	10 nodes to be scored	(23253 eliminated genes)


	 Level 1:	1 nodes to be scored	(25443 eliminated genes)

In [252]:
#retrieve the GO enrichment 
goEnrichment_MF   <- GenTable(GOdata_MF, Fisher = results_go_MF, orderBy = "Fisher", topNodes = 100, numChar=1000)
In [253]:
#lets graph the GO enrichment (I adapted this code from online)
goEnrichment_MF$Fisher <- as.numeric(goEnrichment_MF$Fisher)
goEnrichment_MF <- goEnrichment_MF[goEnrichment_MF$Fisher < 0.05,] 
goEnrichment_MF <- goEnrichment_MF[goEnrichment_MF$Significant > 1,] 
goEnrichment_MF <- goEnrichment_MF[,c("GO.ID","Term", "Annotated", "Significant", "Expected", "Fisher")]
goEnrichment_MF
A data.frame: 22 × 6
GO.IDTermAnnotatedSignificantExpectedFisher
<chr><chr><int><int><dbl><dbl>
1GO:0004656procollagen-proline 4-dioxygenase activity 117 6 0.570.000023
2GO:0022848acetylcholine-gated cation-selective channel activity 20 3 0.100.000120
3GO:0015464acetylcholine receptor activity 21 3 0.100.000140
4GO:0031418L-ascorbic acid binding 191 6 0.930.000340
5GO:0004222metalloendopeptidase activity 275 7 1.330.000400
6GO:0017159pantetheine hydrolase activity 10 2 0.050.001020
7GO:0004453juvenile-hormone esterase activity 12 2 0.060.001490
8GO:0005507copper ion binding 108 4 0.520.001910
16GO:0047275glucosaminylgalactosylglucosylceramide beta-galactosyltransferase activity 23 2 0.110.005510
17GO:0004504peptidylglycine monooxygenase activity 28 2 0.140.008100
18GO:0030414peptidase inhibitor activity 316 5 1.530.008460
19GO:0008061chitin binding 89 3 0.430.009310
20GO:0090729toxin activity 101 3 0.490.013100
22GO:0017124SH3 domain binding 199 4 0.960.016130
23GO:0003676nucleic acid binding 67281332.590.017540
24GO:0046914transition metal ion binding 22262110.780.018310
26GO:0004252serine-type endopeptidase activity 340 5 1.650.025110
27GO:0005506iron ion binding 475 6 2.300.028330
31GO:0004190aspartic-type endopeptidase activity 140 3 0.680.030740
33GO:0005201extracellular matrix structural constituent 181 5 0.880.036420
37GO:0004867serine-type endopeptidase inhibitor activity 170 3 0.820.049850
38GO:0005262calcium channel activity 253 4 1.230.049970
In [274]:
#write.table(goEnrichment_MF, "df_TopGO_Skogsbergia_sp_DE_unique_Upper_Lip_MF.tsv",sep = "\t", quote=FALSE)
In [256]:
# GO enrichment Upper Lip - MF 

ntop = 22
ggdata <- goEnrichment_MF[1:ntop,]
ggdata$Term <- factor(ggdata$Term, levels = rev(ggdata$Term)) 
plot_GO_UL_MF <- ggplot(ggdata,
  aes(x = Term, y = -log10(Fisher), size = Significant, fill = -log10(Fisher))) +

  expand_limits(y = 1) +
  geom_point(shape = 21) +
  scale_size(range = c(2,7)) +
  scale_fill_continuous(low = 'royalblue', high = 'red4') +

  labs(
    title = 'GO Analysis - Molecular Function')+
   theme_bw(base_size = 24) +
labs(size= "Number of Genes")+
  theme(
    legend.position = 'right',
    legend.background = element_rect(),
    plot.title = element_text(angle = 0, size = 16, face = 'bold', vjust = 1),
    plot.subtitle = element_text(angle = 0, size = 10, face = 'bold', vjust = 1),
    plot.caption = element_text(angle = 0, size = 12, face = 'bold', vjust = 1),

   axis.text.x = element_blank(),
    axis.text.y = element_text(angle = 0, size = 13, face = 'bold', vjust = 0.5),
    axis.title.x = element_blank(),
    axis.title.y = element_text(size = 8, face = 'bold'),
    axis.line = element_line(colour = 'black'),

    legend.key = element_blank(),
    legend.text = element_text(size = 14, face = "bold"), 
    title = element_text(size = 14, face = "bold")) +


  coord_flip()

#dev.off()
In [257]:
plot_GO_UL_MF +labs(x=NULL)
In [60]:
#library (repr)
options(repr.plot.width=8, repr.plot.height=5)

WGCNA

To determine if the BCN shares similarities and is conserved in the co-expression networks of the non-bioluminescent relative, we checked whether genes in the BCN that are orthologous to genes in the Skogsbergia sp. transcriptome are conserved in their interactions in co-expression networks of Skogsbergia sp.. Co-expression networks for Skogsbergia sp. were generated with WGCNA using 15 total samples from three tissues (upper lip, gut, and compound eye), each with 5 biological replicates. WGCNA (Weighted Gene Co-expression Network Analysis) identifies clusters (modules) of highly correlated genes by constructing a network based on pairwise correlations between gene expression profiles (Langfelder and Horvath, 2008). These modules often correspond to specific biological processes or pathways, indicating that the genes within a module may be part of the same regulatory process. By analyzing the connectivity of genes within each module, WGCNA also helps identify key drivers or hub genes, providing insights into gene regulation and the biological processes they govern. The following scripts are from the WGCNA package (Langfelder and Horvath, 2008).

Run WGCNA

In [20]:
#grab the count matrix from the DEseq2 object 
vsd_matrix <- assay(vsd_prefiltered_wgcna)
## many functions expect the matrix to be transposed
datExpr <- t(vsd_matrix) 
## check rows/cols
nrow(datExpr)
ncol(datExpr)
15
25909
In [21]:
head(datExpr)
A matrix: 6 × 25909 of type dbl
TRINITY_DN0_c0_g1_i2TRINITY_DN0_c0_g3_i1TRINITY_DN0_c0_g4_i2TRINITY_DN100005_c0_g1_i1TRINITY_DN100010_c0_g1_i1TRINITY_DN10001_c0_g2_i1TRINITY_DN10001_c1_g1_i5TRINITY_DN10002_c0_g2_i1TRINITY_DN10002_c0_g3_i1TRINITY_DN10002_c1_g2_i2TRINITY_DN9992_c2_g1_i1TRINITY_DN9993_c0_g1_i1TRINITY_DN9997_c0_g1_i14TRINITY_DN9997_c2_g1_i4TRINITY_DN9998_c0_g1_i1TRINITY_DN9999_c0_g2_i1TRINITY_DN999_c0_g1_i5TRINITY_DN99_c0_g1_i8TRINITY_DN99_c1_g1_i2TRINITY_DN9_c0_g1_i1
Sk.10A_fasta90_isoform.counts.tab-0.48330343.513079 2.2217544 3.7108665-0.4833034-0.48330341.549772 3.00977714.4241575.123181-0.4833034 1.5497719-0.4833034-0.48330344.179557 4.72729797.1139066.595724-0.48330346.409652
Sk.10B_fasta90_isoform.counts.tab 4.20612231.286318 5.0482202 2.8977955 1.2863178-0.48330343.887132-0.48330345.3791365.749601-0.4833034 3.7628464-0.4833034-0.48330344.879275 3.88713177.4054417.840629 2.63717037.068321
Sk.10C_fasta90_isoform.counts.tab 8.41214355.388425 9.1129710 2.8702510 3.1876272 2.09994354.621741 2.98410754.7864684.586408 3.5239739 4.2673707 4.2673707 3.73277354.222018 1.87649005.8118028.119777 4.78646823.796123
Sk.6A_fasta90_isoform.counts.tab-0.48330343.088815 3.5962822-0.4833034 3.4462981 1.87594024.512263 1.87594024.5122635.542985 3.9702006 4.2666132-0.4833034-0.48330343.446298-0.48330345.9836876.232227 1.26766806.137886
Sk.6B_fasta90_isoform.counts.tab 1.83966985.415046-0.4833034 3.3989581-0.4833034-0.48330344.734369 2.56742043.3989585.251051 3.3989581-0.4833034-0.4833034-0.48330341.839670-0.48330346.3909614.734369 2.56742043.920894
Sk.6C_fasta90_isoform..counts.tab 8.82224656.882231 9.9898509 2.2964139 2.9891895 1.88059663.738243 1.88059664.9117125.603886 4.1323955 4.1808162 4.9952269 4.08228734.082287 1.27141375.8177398.107196 5.26570304.441411
In [22]:
head(vsd_matrix)
A matrix: 6 × 15 of type dbl
Sk.10A_fasta90_isoform.counts.tabSk.10B_fasta90_isoform.counts.tabSk.10C_fasta90_isoform.counts.tabSk.6A_fasta90_isoform.counts.tabSk.6B_fasta90_isoform.counts.tabSk.6C_fasta90_isoform..counts.tabSk.7A_fasta90_isoform.counts.tabSk.7B_fasta90_isoform.counts.tabSk.7C_fasta90_isoform.counts.tabSk.8A_fasta90_isoform.counts.tabSk.8B_fasta90_isoform..counts.tabSk.8C_fasta90_isoform..counts.tabSk.9A_fasta90_isoform.counts.tabSk.9B_fasta90_isoform..counts.tabSk.9C_fasta90_isoform.counts.tab
TRINITY_DN0_c0_g1_i2-0.4833034 4.20612238.412143-0.4833034 1.83966988.8222473.818597 2.29412736.8033914.0361794.8553816.1395613.083563 2.96082137.977188
TRINITY_DN0_c0_g3_i1 3.5130787 1.28631785.388425 3.0888149 5.41504656.8822313.633338 1.87852814.2447783.0522792.1286514.0519792.486303-0.48330345.060960
TRINITY_DN0_c0_g4_i2 2.2217544 5.04822029.112971 3.5962822-0.48330349.9898514.834828 4.07946948.5571386.4404247.6385867.9618674.313425 6.97714919.330997
TRINITY_DN100005_c0_g1_i1 3.7108665 2.89779552.870251-0.4833034 3.39895812.2964143.300928 1.87852812.0169213.0522791.1505482.6433613.083563 3.83148922.204853
TRINITY_DN100010_c0_g1_i1-0.4833034 1.28631783.187627 3.4462981-0.48330342.9891901.941278 1.87852812.9908432.5758783.0890223.4183341.411223-0.48330342.045164
TRINITY_DN10001_c0_g2_i1-0.4833034-0.48330342.099944 1.8759402-0.48330341.8805972.236582-0.48330342.0169212.2590893.2533042.2486443.083563 3.37439123.227663
In [23]:
#run this to check if there are gene outliers
gsg=goodSamplesGenes(datExpr, verbose = 3)
gsg$allOK
 Flagging genes and samples with too many missing values...
  ..step 1
TRUE
In [24]:
# Choose a set of soft-thresholding powers
powers = c(c(1:10), seq(from = 12, to=20, by=2))
# Call the network topology analysis function
sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)
pickSoftThreshold: will use block size 1726.
 pickSoftThreshold: calculating connectivity for given powers...
   ..working on genes 1 through 1726 of 25909
   ..working on genes 1727 through 3452 of 25909
   ..working on genes 3453 through 5178 of 25909
   ..working on genes 5179 through 6904 of 25909
   ..working on genes 6905 through 8630 of 25909
   ..working on genes 8631 through 10356 of 25909
   ..working on genes 10357 through 12082 of 25909
   ..working on genes 12083 through 13808 of 25909
   ..working on genes 13809 through 15534 of 25909
   ..working on genes 15535 through 17260 of 25909
   ..working on genes 17261 through 18986 of 25909
   ..working on genes 18987 through 20712 of 25909
   ..working on genes 20713 through 22438 of 25909
   ..working on genes 22439 through 24164 of 25909
   ..working on genes 24165 through 25890 of 25909
   ..working on genes 25891 through 25909 of 25909
   Power SFT.R.sq  slope truncated.R.sq mean.k. median.k. max.k.
1      1   0.3770  4.520          0.949  7330.0   7100.00  10900
2      2   0.0518  0.660          0.834  3050.0   2800.00   6120
3      3   0.0748 -0.511          0.758  1530.0   1310.00   3940
4      4   0.5370 -1.230          0.857   858.0    690.00   2790
5      5   0.8210 -1.610          0.945   523.0    391.00   2100
6      6   0.9170 -1.750          0.974   339.0    236.00   1640
7      7   0.9540 -1.810          0.983   230.0    149.00   1330
8      8   0.9740 -1.810          0.989   162.0     98.20   1110
9      9   0.9800 -1.810          0.989   118.0     66.60    950
10    10   0.9870 -1.800          0.991    88.7     46.60    827
11    12   0.9940 -1.740          0.994    53.3     24.40    652
12    14   0.9980 -1.690          0.997    34.4     13.70    534
13    16   0.9970 -1.620          0.997    23.6      8.16    450
14    18   0.9950 -1.560          0.996    16.9      5.09    388
15    20   0.9920 -1.510          0.995    12.6      3.31    340
In [25]:
# Plot the results
cex1 = 0.9;
# Scale-free topology fit index as a function of the soft-thresholding power
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
     xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
     main = paste("Scale independence"));
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
     labels=powers,cex=cex1,col="red");
# this line corresponds to using an R^2 cut-off of h
abline(h=0.90,col="red")
# Mean connectivity as a function of the soft-thresholding power
plot(sft$fitIndices[,1], sft$fitIndices[,5],
     xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
     main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")
In [26]:
# Co-expression similarity and adjacency using assigned softpower
softPower=7
adjacency = adjacency(datExpr, power = softPower)
In [27]:
# Topological Overlap Matrix (TOM)
# Turn adjacency into topological overlap, i.e. translate the adjacency into 
# topological overlap matrix and calculate the corresponding dissimilarity:
TOM = TOMsimilarity(adjacency, TOMType = "signed", verbose = 5);
dissTOM = 1-TOM;
..connectivity..
..matrix multiplication (system BLAS)..
..normalization..
..done.
In [28]:
# Call the hierarchical clustering function
geneTree = hclust(as.dist(dissTOM), method = "average");
In [64]:
minModuleSize = 50
# Module identification using dynamic tree cut:
dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM,
                deepSplit = 2, pamRespectsDendro = FALSE,
                minClusterSize = minModuleSize);
 ..cutHeight not given, setting it to 0.993  ===>  99% of the (truncated) height range in dendro.
 ..done.
In [65]:
table(dynamicMods)
dynamicColors = labels2colors(dynamicMods)
dynamicMods
   1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
4054  676  661  425  407  377  372  365  360  348  346  314  275  264  256  254 
  17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32 
 239  239  232  231  230  225  224  222  221  220  214  214  211  209  205  201 
  33   34   35   36   37   38   39   40   41   42   43   44   45   46   47   48 
 199  197  195  195  192  186  183  183  182  181  171  165  164  164  164  163 
  49   50   51   52   53   54   55   56   57   58   59   60   61   62   63   64 
 160  159  158  157  154  151  150  150  149  146  144  144  144  143  142  142 
  65   66   67   68   69   70   71   72   73   74   75   76   77   78   79   80 
 141  140  139  138  138  138  136  136  135  134  133  131  130  129  129  128 
  81   82   83   84   85   86   87   88   89   90   91   92   93   94   95   96 
 127  127  127  127  127  125  124  124  123  122  122  121  121  120  120  119 
  97   98   99  100  101  102  103  104  105  106  107  108  109  110  111  112 
 119  116  116  115  112  112  111  109  109  108  106  105  104  104  103  103 
 113  114  115  116  117  118  119  120  121  122  123  124  125  126  127  128 
 102  101  101   99   99   97   96   95   94   86   84   84   84   82   81   73 
In [66]:
table(dynamicColors)
dynamicColors
  antiquewhite1   antiquewhite2   antiquewhite4         bisque4           black 
             96             122             144             164             372 
           blue           blue2           blue4      blueviolet           brown 
            676             131             109             109             661 
         brown2          brown4      chocolate4           coral          coral1 
            133             164              97             123             144 
         coral2          coral3          coral4            cyan       darkgreen 
            143             122              95             264             225 
       darkgrey     darkmagenta  darkolivegreen darkolivegreen2 darkolivegreen4 
            222             197             199             111             134 
     darkorange     darkorange2         darkred   darkseagreen2   darkseagreen3 
            220             164             230              99             124 
  darkseagreen4   darkslateblue   darkturquoise      darkviolet        deeppink 
            144             163             224             130             108 
     firebrick3      firebrick4     floralwhite           green          green4 
            112             135             165             407              99 
    greenyellow          grey60        honeydew       honeydew1      indianred3 
            346             239             124             146             112 
     indianred4           ivory  lavenderblush1  lavenderblush2  lavenderblush3 
            136             171             101             125             149 
     lightblue4      lightcoral       lightcyan      lightcyan1      lightgreen 
            115             136             254             181             239 
     lightpink2      lightpink3      lightpink4   lightskyblue4  lightslateblue 
            101             127             150              73             116 
 lightsteelblue lightsteelblue1     lightyellow         magenta        magenta3 
            138             182             232             360             102 
       magenta4          maroon    mediumorchid    mediumpurple   mediumpurple1 
            127             150             142              81             116 
  mediumpurple2   mediumpurple3   mediumpurple4    midnightblue       mistyrose 
            138             183             121             256              94 
    navajowhite    navajowhite1    navajowhite2          orange       orangered 
            103             127             151             221              82 
     orangered1      orangered3      orangered4   paleturquoise  palevioletred1 
            119             138             183             205             103 
 palevioletred2  palevioletred3            pink           pink3           pink4 
            127             154             365              84             119 
           plum           plum1           plum2           plum3           plum4 
            139             186             160             129             106 
         purple             red       royalblue     saddlebrown          salmon 
            348             377             231             211             275 
        salmon1         salmon2         salmon4         sienna2         sienna3 
            104             127             157              84             195 
        sienna4         skyblue        skyblue1        skyblue2        skyblue3 
            120             214             140             142             192 
       skyblue4       slateblue       steelblue             tan            tan4 
            121              86             209             314             104 
        thistle        thistle1        thistle2        thistle3        thistle4 
            128             158             159             129             105 
      turquoise          violet           white          yellow         yellow2 
           4054             201             214             425              84 
        yellow3         yellow4     yellowgreen 
            120             141             195 
In [67]:
# Plot the dendrogram and colors underneath
#sizeGrWindow(8,6)
plotDendroAndColors(geneTree, dynamicColors, "Dynamic Tree Cut",
                    dendroLabels = FALSE, hang = 0.03,
                    addGuide = TRUE, guideHang = 0.05,
                    main = "Gene dendrogram and module colors")
In [68]:
# Calculate eigengenes
MEList = moduleEigengenes(datExpr, colors = dynamicColors)
MEs = MEList$eigengenes
# Calculate dissimilarity of module eigengenes
MEDiss = 1-cor(MEs);
# Cluster module eigengenes
METree = hclust(as.dist(MEDiss), method = "average");
# Plot the result
#sizeGrWindow(7, 6)
plot(METree, main = "Clustering of module eigengenes",
     xlab = "", sub = "")
MEDissThres = 0.30
# Plot the cut line into the dendrogram
abline(h=MEDissThres, col = "red")
In [69]:
# Call an automatic merging function
merge = mergeCloseModules(datExpr, dynamicColors, cutHeight = MEDissThres, verbose = 3)
# The merged module colors
mergedColors = merge$colors;
# Eigengenes of the new merged modules:
mergedMEs = merge$newMEs;
 mergeCloseModules: Merging modules whose distance is less than 0.3
   multiSetMEs: Calculating module MEs.
     Working on set 1 ...
     moduleEigengenes: Calculating 128 module eigengenes in given set.
   multiSetMEs: Calculating module MEs.
     Working on set 1 ...
     moduleEigengenes: Calculating 72 module eigengenes in given set.
   multiSetMEs: Calculating module MEs.
     Working on set 1 ...
     moduleEigengenes: Calculating 55 module eigengenes in given set.
   multiSetMEs: Calculating module MEs.
     Working on set 1 ...
     moduleEigengenes: Calculating 50 module eigengenes in given set.
   multiSetMEs: Calculating module MEs.
     Working on set 1 ...
     moduleEigengenes: Calculating 49 module eigengenes in given set.
   Calculating new MEs...
   multiSetMEs: Calculating module MEs.
     Working on set 1 ...
     moduleEigengenes: Calculating 49 module eigengenes in given set.
In [70]:
plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors),
                    c("Dynamic Tree Cut", "Merged dynamic"),
                    dendroLabels = FALSE, hang = 0.03,
                    addGuide = TRUE, guideHang = 0.05)
In [71]:
table(mergedColors)
mergedColors
 antiquewhite2  antiquewhite4        bisque4          black           blue 
           489            594            164            619            676 
         blue2          blue4          brown         brown2         brown4 
          1953            109            800            239            848 
    chocolate4         coral1         coral4           cyan       darkgrey 
           220            699           2352           1183           1265 
   darkmagenta        darkred  darkseagreen2  darkseagreen3  darkturquoise 
           798            230             99            236            548 
    darkviolet       deeppink          green         green4       honeydew 
           559            108            407             99            225 
lavenderblush3     lightgreen     lightpink2     lightpink3  lightskyblue4 
           697            911            101            127             73 
 mediumpurple2  mediumpurple3  mediumpurple4   midnightblue      mistyrose 
           224            183            121            256            251 
   navajowhite   navajowhite2         orange      orangered palevioletred1 
           103            151            221            966            103 
          pink          plum1        salmon1        skyblue       skyblue1 
           365            328            104            214            140 
           tan           tan4      turquoise        yellow2 
           509            104           4054             84 

Module to trait heatmap

Identify modules (network) that are significantly associated with samples. The module eigengene provides a representative measure of the gene expression patterns within a module, allowing correlation with these traits to determine the most significant associations (Langfelder and Horvath, 2008). This correlation can then be visualized in a module-to-trait heatmap to determine the most significant associations.The following scripts are from the WGCNA package (Langfelder and Horvath, 2008).

In [39]:
skogs_traits <- read_excel("skogs_trait.xlsx")
In [40]:
skogs_traits
A tibble: 15 × 5
Sample_NameTissue_TypeUpper_lipEyeGut
<chr><chr><dbl><dbl><dbl>
Sk.10A_fasta97.counts.tabUpper_lip100
Sk.10B_fasta97.counts.tabEye 010
Sk.10C_fasta97.counts.tabGut 001
Sk.6A_fasta97.counts.tab Upper_lip100
Sk.6B_fasta97.counts.tab Eye 010
Sk.6C_fasta97.counts.tab Gut 001
Sk.7A_fasta97.counts.tab Upper_lip100
Sk.7B_fasta97.counts.tab Eye 010
Sk.7C_fasta97.counts.tab Gut 001
Sk.8A_fasta97.counts.tab Upper_lip100
Sk.8B_fasta97.counts.tab Eye 010
Sk.8C_fasta97.counts.tab Gut 001
Sk.9A_fasta97.counts.tab Upper_lip100
Sk.9B_fasta97.counts.tab Eye 010
Sk.9C_fasta97.counts.tab Gut 001
In [41]:
sample_names_numeric <- c("1", "2", "3", "1", "2" ,"3", "1", "2", "3", "1", "2", "3", "1", "2", "3")
In [42]:
skogs_traits$sample_names_numeric <- sample_names_numeric
In [43]:
skogs_traits
A tibble: 15 × 6
Sample_NameTissue_TypeUpper_lipEyeGutsample_names_numeric
<chr><chr><dbl><dbl><dbl><chr>
Sk.10A_fasta97.counts.tabUpper_lip1001
Sk.10B_fasta97.counts.tabEye 0102
Sk.10C_fasta97.counts.tabGut 0013
Sk.6A_fasta97.counts.tab Upper_lip1001
Sk.6B_fasta97.counts.tab Eye 0102
Sk.6C_fasta97.counts.tab Gut 0013
Sk.7A_fasta97.counts.tab Upper_lip1001
Sk.7B_fasta97.counts.tab Eye 0102
Sk.7C_fasta97.counts.tab Gut 0013
Sk.8A_fasta97.counts.tab Upper_lip1001
Sk.8B_fasta97.counts.tab Eye 0102
Sk.8C_fasta97.counts.tab Gut 0013
Sk.9A_fasta97.counts.tab Upper_lip1001
Sk.9B_fasta97.counts.tab Eye 0102
Sk.9C_fasta97.counts.tab Gut 0013
In [44]:
sample_name
  1. 'Upper_lip'
  2. 'Eye'
  3. 'Gut'
  4. 'Upper_lip'
  5. 'Eye'
  6. 'Gut'
  7. 'Upper_lip'
  8. 'Eye'
  9. 'Gut'
  10. 'Upper_lip'
  11. 'Eye'
  12. 'Gut'
  13. 'Upper_lip'
  14. 'Eye'
  15. 'Gut'
In [45]:
skogsSamples =as.character(sample_name)
traitRows =match(skogsSamples,skogs_traits$Tissue_Type)
In [46]:
traitRows
  1. 1
  2. 2
  3. 3
  4. 1
  5. 2
  6. 3
  7. 1
  8. 2
  9. 3
  10. 1
  11. 2
  12. 3
  13. 1
  14. 2
  15. 3
In [47]:
skogsTraits =skogs_traits[traitRows, -1]
In [48]:
skogsTraits
A tibble: 15 × 5
Tissue_TypeUpper_lipEyeGutsample_names_numeric
<chr><dbl><dbl><dbl><chr>
Upper_lip1001
Eye 0102
Gut 0013
Upper_lip1001
Eye 0102
Gut 0013
Upper_lip1001
Eye 0102
Gut 0013
Upper_lip1001
Eye 0102
Gut 0013
Upper_lip1001
Eye 0102
Gut 0013
In [49]:
rownames(skogsTraits)=skogs_traits$Sample_Name
Warning message:
“Setting row names on a tibble is deprecated.”
In [50]:
skogsTraits$Tissue_Type <- NULL
In [51]:
skogsTraits$sample_names_numeric <- NULL
In [52]:
rownames(skogsTraits)=skogs_traits$Sample_Name
Warning message:
“Setting row names on a tibble is deprecated.”
In [53]:
skogsTraits
A tibble: 15 × 3
Upper_lipEyeGut
<dbl><dbl><dbl>
Sk.10A_fasta97.counts.tab100
Sk.10B_fasta97.counts.tab010
Sk.10C_fasta97.counts.tab001
Sk.6A_fasta97.counts.tab100
Sk.6B_fasta97.counts.tab010
Sk.6C_fasta97.counts.tab001
Sk.7A_fasta97.counts.tab100
Sk.7B_fasta97.counts.tab010
Sk.7C_fasta97.counts.tab001
Sk.8A_fasta97.counts.tab100
Sk.8B_fasta97.counts.tab010
Sk.8C_fasta97.counts.tab001
Sk.9A_fasta97.counts.tab100
Sk.9B_fasta97.counts.tab010
Sk.9C_fasta97.counts.tab001
In [72]:
# Re-cluster samples
sampleTree2 = hclust(dist(datExpr), method = "average")
# Convert traits to a color representation: white means low, red means high, grey means missing entry
traitColors = numbers2colors(skogsTraits, signed = FALSE);
# Plot the sample dendrogram and the colors underneath.
plotDendroAndColors(sampleTree2, traitColors,
                    groupLabels = names(skogsTraits),
                    main = "Sample dendrogram and trait heatmap")
In [73]:
#Define numbers of genes and samples
nGenes = ncol(datExpr);
nSamples = nrow(datExpr);
In [74]:
mergedMEs
A data.frame: 15 × 49
MEgreenMEmediumpurple3MEbrownMEgreen4MEplum1MEblueMEnavajowhiteMEpinkMEdarkturquoiseMEmidnightblueMEmediumpurple4MEdarkredMEmediumpurple2MEbrown4MElightgreenMElightpink3MEantiquewhite2MEcyanMEdarkgreyMElavenderblush3
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
Sk.10A_fasta90_isoform.counts.tab-0.26934560 0.239111985 0.21711515 0.2196965 0.33743814 0.38459592 0.210523129 0.22898410-0.275992687-0.0067371669-0.29416770 0.01966168-0.437762559-0.11722514-0.16101715-0.471924718-0.411944882-0.118025275-0.37912216-0.83926591
Sk.10B_fasta90_isoform.counts.tab 0.36385952-0.007371934 0.12276540 0.1941522-0.42165759 0.03176824 0.274046968 0.14447133-0.579527546-0.9122213249 0.41345983 0.07089089 0.319179049-0.36194423 0.22648524 0.151868843 0.342325747 0.341359081-0.35200566 0.06138683
Sk.10C_fasta90_isoform.counts.tab-0.21790913-0.011016493-0.26681163-0.1918144-0.01575783-0.33956148-0.014072272-0.31595093 0.097729992 0.1130214763 0.27385114 0.11443854 0.175480675 0.21449240 0.17651216 0.148560458 0.155211568 0.158183410 0.19417727 0.12236796
Sk.6A_fasta90_isoform.counts.tab-0.14333603 0.082260204 0.30458464 0.3042555 0.30150856 0.18309922-0.175759359 0.06272879-0.014018570-0.1193309938-0.02662548-0.08716423-0.119121497-0.35679694-0.03385627 0.076741854-0.504193823-0.527925489-0.24347169 0.03993928
Sk.6B_fasta90_isoform.counts.tab 0.23033343 0.327107159 0.44063041 0.2037847-0.37235099-0.31432307-0.542133755 0.01547406 0.305014706 0.1205587318-0.22430288 0.10804807-0.356488587-0.39824089-0.56341048 0.115566660-0.382434967-0.544729289-0.43512269-0.37177812
Sk.6C_fasta90_isoform..counts.tab-0.20274441-0.116352875-0.36931180-0.2444215-0.01162888-0.43694503-0.099937951-0.33181963 0.145334511 0.1672616127 0.24459562 0.15982385 0.211792805 0.21665031 0.17363847 0.160412648 0.106372383 0.156229905 0.22660385 0.15837058
Sk.7A_fasta90_isoform.counts.tab-0.16411038 0.121349921 0.12393708 0.0922627 0.21770877 0.19909178 0.069173717 0.16083530 0.089936298-0.0779037430 0.16381075-0.11475782-0.007041979-0.01824139-0.04284008-0.024305034 0.090665369-0.023293825 0.06879618-0.02009418
Sk.7B_fasta90_isoform.counts.tab 0.44973663 0.055679870 0.13609620-0.4970754-0.42405160 0.25406830-0.423611257-0.39239722 0.214826706 0.1226842710-0.30178804-0.92468927-0.250951811-0.39290203-0.59863318 0.004364871-0.111723747 0.293291886 0.43554339 0.10703000
Sk.7C_fasta90_isoform.counts.tab-0.17923658-0.032781232-0.32510768-0.2973736 0.05071193-0.07860540 0.020153670-0.08677892 0.140229555 0.1643728295 0.25370009 0.06203432 0.189629616 0.21293358 0.10853968 0.152779149 0.165807589 0.158790080 0.27728790 0.14015811
Sk.8A_fasta90_isoform.counts.tab-0.24015185 0.039947087 0.21749287 0.1896878 0.22740798 0.05089790 0.227135388 0.14999242 0.055520253-0.0002117171-0.30059938 0.08824079-0.138237303 0.03453946-0.10540618-0.780560237 0.003392111-0.239162419-0.20499919-0.01021734
Sk.8B_fasta90_isoform..counts.tab 0.29100591-0.868974481-0.35458179-0.2633693 0.12508467 0.18897530 0.373624934 0.41138984 0.180041704 0.1092121538-0.33826340 0.13111185 0.153882653 0.15290215 0.08892941 0.134897554 0.253431890 0.105330664-0.03070693 0.13342488
Sk.8C_fasta90_isoform..counts.tab-0.17168960 0.081337081-0.09715687-0.1083820 0.09342857-0.09713346 0.005021318-0.05011152 0.107373220 0.1127662865 0.12452032 0.08169707 0.101821476 0.14507662 0.08786909 0.056994174 0.096089464 0.003888645 0.13612125 0.06728476
Sk.9A_fasta90_isoform.counts.tab 0.04706604 0.043740188 0.03494143 0.2335867 0.20903158 0.40289036 0.321926743 0.46237581 0.001279817 0.0446154914 0.08274007 0.11856089-0.400434680 0.01011812 0.20725601 0.024737510 0.262811048-0.091409416 0.03460446 0.10442490
Sk.9B_fasta90_isoform..counts.tab 0.40005577 0.145757161 0.13461336 0.3600912-0.35671373-0.22340775-0.256885173-0.30623016-0.575741265 0.0088396836-0.30449289 0.04082688 0.382207318 0.41220849 0.27365221 0.106551384-0.238256749 0.199922420 0.02749974 0.15787044
Sk.9C_fasta90_isoform.counts.tab-0.19353372-0.099793641-0.31920678-0.1950810 0.03984040-0.20541082 0.010793899-0.15296328 0.107993305 0.1530724091 0.23356196 0.13127651 0.176044821 0.24642948 0.16228107 0.143314883 0.172446999 0.127549623 0.24479428 0.14909781
In [75]:
names(mergedMEs)
moduleTraitCor = cor(mergedMEs, skogsTraits, use = "p");
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples);
  1. 'MEgreen'
  2. 'MEmediumpurple3'
  3. 'MEbrown'
  4. 'MEgreen4'
  5. 'MEplum1'
  6. 'MEblue'
  7. 'MEnavajowhite'
  8. 'MEpink'
  9. 'MEdarkturquoise'
  10. 'MEmidnightblue'
  11. 'MEmistyrose'
  12. 'MEblue4'
  13. 'MEhoneydew'
  14. 'MEantiquewhite4'
  15. 'MEorange'
  16. 'MEblack'
  17. 'MEskyblue'
  18. 'MEdarkviolet'
  19. 'MEbrown2'
  20. 'MEskyblue1'
  21. 'MEdeeppink'
  22. 'MElightskyblue4'
  23. 'MEchocolate4'
  24. 'MElightpink2'
  25. 'MEblue2'
  26. 'MEcoral4'
  27. 'MEnavajowhite2'
  28. 'MEdarkseagreen2'
  29. 'MEdarkseagreen3'
  30. 'MEsalmon1'
  31. 'MEtan'
  32. 'MEbisque4'
  33. 'MEtan4'
  34. 'MEdarkmagenta'
  35. 'MEorangered'
  36. 'MEturquoise'
  37. 'MEpalevioletred1'
  38. 'MEyellow2'
  39. 'MEcoral1'
  40. 'MEmediumpurple4'
  41. 'MEdarkred'
  42. 'MEmediumpurple2'
  43. 'MEbrown4'
  44. 'MElightgreen'
  45. 'MElightpink3'
  46. 'MEantiquewhite2'
  47. 'MEcyan'
  48. 'MEdarkgrey'
  49. 'MElavenderblush3'
In [76]:
#view the graph
# Will display correlations and their p-values
textMatrix = paste(signif(moduleTraitCor, 2), "\n(",
                   signif(moduleTraitPvalue, 1), ")", sep = "");
dim(textMatrix) = dim(moduleTraitCor)
In [77]:
par(mar = c(6, 10, 6, 1) )
# Display the correlation values within a heatmap plot
labeledHeatmap(moduleTraitCor,
               xLabels = names(skogsTraits),
               yLabels = names(mergedMEs),
               ySymbols = names(mergedMEs),
               colorLabels = FALSE,
               colors = blueWhiteRed(50),
              textMatrix = textMatrix,
               setStdMargins = FALSE,
               cex.text = 0.5,
               textAdj = c(0.5, 0.5),
               zlim = c(-1,1),
               main = paste("Module-trait relationships"))
In [60]:
#library (repr)
options(repr.plot.width=10, repr.plot.height=10)

WGCNA Function

In [78]:
wgcna_adjacency <- function(datExpr, minModuleSize=50, MEDissThres = .30, deepSplit = 2) {
 
  #compute adjacency
  adjacency <- adjacency(datExpr, power = 7)
  TOM <- TOMsimilarity(adjacency,TOMType="signed")
  geneTree <- hclust(as.dist(1-TOM), method = "average")
    


  # Module identification using dynamic tree cut:
  dynamicMods <- cutreeDynamic(dendro = geneTree, distM = 1-TOM, deepSplit = 2, pamRespectsDendro = FALSE, minClusterSize = minModuleSize);
  table(dynamicMods)
  dynamicColors = labels2colors(dynamicMods)

  # Calculate eigengenes
  MEList = moduleEigengenes(datExpr, colors = dynamicColors)
  MEs = MEList$eigengenes
  # Calculate dissimilarity of module eigengenes
  METree = hclust(as.dist(1-cor(MEs)), method = "average");
  plot(METree, main = "Clustering of module eigengenes",xlab = "", sub = "")
  # Plot the cut line into the dendrogram
  abline(h=MEDissThres, col = "red")

  merge <- mergeCloseModules(datExpr, dynamicColors, cutHeight = MEDissThres, verbose = 0)

  # The merged module colors
  mergedColors = merge$colors
  # Eigengenes of the new merged modules:
  mergedMEs = merge$newMEs
  
  # Rename to moduleColors
  moduleColors = mergedColors
  # Construct numerical labels corresponding to the colors
  colorOrder = c("grey", standardColors(50));
  moduleLabels = match(moduleColors, colorOrder)-1;
  MEs = mergedMEs;

  # Recalculate MEs with color labels
  invisible(MEs0 <- moduleEigengenes(datExpr, moduleColors)$eigengenes)
  MEs = orderMEs(MEs0)

  print(table(moduleColors))
  moduleColors <- as.data.frame(moduleColors)
  rownames(moduleColors) <- colnames(datExpr)

  return(list(adjacency=adjacency,MEs=MEs,moduleColors=moduleColors, dynamicMods=dynamicMods, geneTree=geneTree))
}
In [79]:
wgcna.results <- wgcna_adjacency(datExpr, MEDissThres = 0.30)
..connectivity..
..matrix multiplication (system BLAS)..
..normalization..
..done.
 ..cutHeight not given, setting it to 0.993  ===>  99% of the (truncated) height range in dendro.
 ..done.
moduleColors
 antiquewhite2  antiquewhite4        bisque4          black           blue 
           489            594            164            619            676 
         blue2          blue4          brown         brown2         brown4 
          1953            109            800            239            848 
    chocolate4         coral1         coral4           cyan       darkgrey 
           220            699           2352           1183           1265 
   darkmagenta        darkred  darkseagreen2  darkseagreen3  darkturquoise 
           798            230             99            236            548 
    darkviolet       deeppink          green         green4       honeydew 
           559            108            407             99            225 
lavenderblush3     lightgreen     lightpink2     lightpink3  lightskyblue4 
           697            911            101            127             73 
 mediumpurple2  mediumpurple3  mediumpurple4   midnightblue      mistyrose 
           224            183            121            256            251 
   navajowhite   navajowhite2         orange      orangered palevioletred1 
           103            151            221            966            103 
          pink          plum1        salmon1        skyblue       skyblue1 
           365            328            104            214            140 
           tan           tan4      turquoise        yellow2 
           509            104           4054             84 
In [80]:
#wgcna.results 
dynamicColors<- labels2colors(wgcna.results$dynamicMods)
merge <- mergeCloseModules(datExpr, dynamicColors, cutHeight = 0.30, verbose = 0) 
mergedColors<-merge$colors
mergedMEs<-merge$newMEs

Determine the number of Skogsbergia sp. networks that contain BCN one-to-one orthologs

In [110]:
vsd_matrix_df <- as.data.frame(vsd_matrix)
In [111]:
vsd_matrix_df$colors <- wgcna.results[["moduleColors"]]$moduleColors
In [112]:
head(vsd_matrix_df)
A data.frame: 6 × 16
Sk.10A_fasta90_isoform.counts.tabSk.10B_fasta90_isoform.counts.tabSk.10C_fasta90_isoform.counts.tabSk.6A_fasta90_isoform.counts.tabSk.6B_fasta90_isoform.counts.tabSk.6C_fasta90_isoform..counts.tabSk.7A_fasta90_isoform.counts.tabSk.7B_fasta90_isoform.counts.tabSk.7C_fasta90_isoform.counts.tabSk.8A_fasta90_isoform.counts.tabSk.8B_fasta90_isoform..counts.tabSk.8C_fasta90_isoform..counts.tabSk.9A_fasta90_isoform.counts.tabSk.9B_fasta90_isoform..counts.tabSk.9C_fasta90_isoform.counts.tabcolors
<dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><chr>
TRINITY_DN0_c0_g1_i2-0.4833034 4.20612238.412143-0.4833034 1.83966988.8222473.818597 2.29412736.8033914.0361794.8553816.1395613.083563 2.96082137.977188turquoise
TRINITY_DN0_c0_g3_i1 3.5130787 1.28631785.388425 3.0888149 5.41504656.8822313.633338 1.87852814.2447783.0522792.1286514.0519792.486303-0.48330345.060960turquoise
TRINITY_DN0_c0_g4_i2 2.2217544 5.04822029.112971 3.5962822-0.48330349.9898514.834828 4.07946948.5571386.4404247.6385867.9618674.313425 6.97714919.330997brown
TRINITY_DN100005_c0_g1_i1 3.7108665 2.89779552.870251-0.4833034 3.39895812.2964143.300928 1.87852812.0169213.0522791.1505482.6433613.083563 3.83148922.204853coral4
TRINITY_DN100010_c0_g1_i1-0.4833034 1.28631783.187627 3.4462981-0.48330342.9891901.941278 1.87852812.9908432.5758783.0890223.4183341.411223-0.48330342.045164coral4
TRINITY_DN10001_c0_g2_i1-0.4833034-0.48330342.099944 1.8759402-0.48330341.8805972.236582-0.48330342.0169212.2590893.2533042.2486443.083563 3.37439123.227663blue2
In [113]:
#how to make rownames the first column 
vsd_matrix_df$transcript_id <-rownames(vsd_matrix_df)
In [114]:
transcripts_to_module <- na.omit(vsd_matrix_df) %>% dplyr::select(transcript_id, colors)
In [115]:
head(transcripts_to_module)
A data.frame: 6 × 2
transcript_idcolors
<chr><chr>
TRINITY_DN0_c0_g1_i2TRINITY_DN0_c0_g1_i2 turquoise
TRINITY_DN0_c0_g3_i1TRINITY_DN0_c0_g3_i1 turquoise
TRINITY_DN0_c0_g4_i2TRINITY_DN0_c0_g4_i2 brown
TRINITY_DN100005_c0_g1_i1TRINITY_DN100005_c0_g1_i1coral4
TRINITY_DN100010_c0_g1_i1TRINITY_DN100010_c0_g1_i1coral4
TRINITY_DN10001_c0_g2_i1TRINITY_DN10001_c0_g2_i1 blue2
In [116]:
BCN_on_to_one_orthologs_skogs_networks <- transcripts_to_module %>% filter(transcript_id %in% BCN_one_to_one_orthologs_skogs$transcript_id)
In [117]:
BCN_on_to_one_orthologs_skogs_networks_ordered_color <- BCN_on_to_one_orthologs_skogs_networks %>% arrange(colors)
In [118]:
BCN_on_to_one_orthologs_skogs_networks_ordered_color
A data.frame: 51 × 2
transcript_idcolors
<chr><chr>
TRINITY_DN3355_c0_g1_i2TRINITY_DN3355_c0_g1_i2 antiquewhite2
TRINITY_DN2507_c0_g1_i2TRINITY_DN2507_c0_g1_i2 antiquewhite4
TRINITY_DN2286_c0_g1_i5TRINITY_DN2286_c0_g1_i5 bisque4
TRINITY_DN3218_c0_g4_i1TRINITY_DN3218_c0_g4_i1 bisque4
TRINITY_DN1706_c0_g1_i1TRINITY_DN1706_c0_g1_i1 brown
TRINITY_DN177_c0_g1_i4TRINITY_DN177_c0_g1_i4 brown
TRINITY_DN82963_c0_g1_i1TRINITY_DN82963_c0_g1_i1brown2
TRINITY_DN16890_c0_g1_i1TRINITY_DN16890_c0_g1_i1brown4
TRINITY_DN42_c0_g1_i1TRINITY_DN42_c0_g1_i1 brown4
TRINITY_DN1749_c0_g1_i2TRINITY_DN1749_c0_g1_i2 chocolate4
TRINITY_DN696_c0_g1_i1TRINITY_DN696_c0_g1_i1 chocolate4
TRINITY_DN10898_c0_g2_i5TRINITY_DN10898_c0_g2_i5coral4
TRINITY_DN16156_c0_g1_i1TRINITY_DN16156_c0_g1_i1coral4
TRINITY_DN21195_c0_g1_i1TRINITY_DN21195_c0_g1_i1coral4
TRINITY_DN2400_c0_g1_i1TRINITY_DN2400_c0_g1_i1 coral4
TRINITY_DN39451_c0_g1_i1TRINITY_DN39451_c0_g1_i1coral4
TRINITY_DN42737_c0_g1_i1TRINITY_DN42737_c0_g1_i1coral4
TRINITY_DN274_c0_g1_i2TRINITY_DN274_c0_g1_i2 cyan
TRINITY_DN6920_c0_g1_i1TRINITY_DN6920_c0_g1_i1 cyan
TRINITY_DN14586_c0_g3_i1TRINITY_DN14586_c0_g3_i1darkgrey
TRINITY_DN21940_c0_g1_i2TRINITY_DN21940_c0_g1_i2darkgrey
TRINITY_DN27140_c0_g1_i1TRINITY_DN27140_c0_g1_i1darkgrey
TRINITY_DN11556_c0_g1_i1TRINITY_DN11556_c0_g1_i1darkmagenta
TRINITY_DN16940_c0_g1_i1TRINITY_DN16940_c0_g1_i1darkmagenta
TRINITY_DN174_c0_g1_i6TRINITY_DN174_c0_g1_i6 darkmagenta
TRINITY_DN33065_c0_g1_i1TRINITY_DN33065_c0_g1_i1darkmagenta
TRINITY_DN4646_c0_g2_i1TRINITY_DN4646_c0_g2_i1 darkturquoise
TRINITY_DN908_c0_g2_i1TRINITY_DN908_c0_g2_i1 darkviolet
TRINITY_DN3729_c0_g1_i3TRINITY_DN3729_c0_g1_i3 green
TRINITY_DN14703_c0_g1_i6TRINITY_DN14703_c0_g1_i6lavenderblush3
TRINITY_DN1778_c0_g1_i2TRINITY_DN1778_c0_g1_i2 lavenderblush3
TRINITY_DN4229_c0_g2_i1TRINITY_DN4229_c0_g2_i1 midnightblue
TRINITY_DN565_c0_g1_i1TRINITY_DN565_c0_g1_i1 midnightblue
TRINITY_DN7244_c4_g1_i1TRINITY_DN7244_c4_g1_i1 mistyrose
TRINITY_DN41835_c0_g1_i1TRINITY_DN41835_c0_g1_i1navajowhite2
TRINITY_DN17668_c0_g1_i1TRINITY_DN17668_c0_g1_i1plum1
TRINITY_DN1137_c0_g1_i1TRINITY_DN1137_c0_g1_i1 tan
TRINITY_DN24066_c0_g1_i1TRINITY_DN24066_c0_g1_i1tan
TRINITY_DN11085_c0_g1_i2TRINITY_DN11085_c0_g1_i2turquoise
TRINITY_DN1360_c0_g1_i1TRINITY_DN1360_c0_g1_i1 turquoise
TRINITY_DN3044_c0_g1_i2TRINITY_DN3044_c0_g1_i2 turquoise
TRINITY_DN3422_c1_g1_i1TRINITY_DN3422_c1_g1_i1 turquoise
TRINITY_DN3446_c0_g2_i1TRINITY_DN3446_c0_g2_i1 turquoise
TRINITY_DN4013_c0_g1_i1TRINITY_DN4013_c0_g1_i1 turquoise
TRINITY_DN51520_c0_g1_i1TRINITY_DN51520_c0_g1_i1turquoise
TRINITY_DN671_c0_g1_i1TRINITY_DN671_c0_g1_i1 turquoise
TRINITY_DN735_c0_g1_i2TRINITY_DN735_c0_g1_i2 turquoise
TRINITY_DN7748_c0_g1_i4TRINITY_DN7748_c0_g1_i4 turquoise
TRINITY_DN87164_c0_g1_i1TRINITY_DN87164_c0_g1_i1turquoise
TRINITY_DN94583_c0_g1_i1TRINITY_DN94583_c0_g1_i1turquoise
TRINITY_DN21901_c0_g1_i1TRINITY_DN21901_c0_g1_i1yellow2
In [119]:
# the BCN one to one orthologs are found across 24 modules 
length(unique(BCN_on_to_one_orthologs_skogs_networks_ordered_color$colors))
22
In [120]:
as.data.frame(table(BCN_on_to_one_orthologs_skogs_networks_ordered_color$colors))
A data.frame: 22 × 2
Var1Freq
<fct><int>
antiquewhite2 1
antiquewhite4 1
bisque4 2
brown 2
brown2 1
brown4 2
chocolate4 2
coral4 6
cyan 2
darkgrey 3
darkmagenta 4
darkturquoise 1
darkviolet 1
green 1
lavenderblush3 2
midnightblue 2
mistyrose 1
navajowhite2 1
plum1 1
tan 2
turquoise 12
yellow2 1

Module info

To test the hypothesis of biological significance for co-expression modules, it was checked whether modules correlated with eye tissue contain genes related to eye function and development.

In [81]:
SubGeneNames = colnames(datExpr)
In [83]:
green_eye_module=as.data.frame(SubGeneNames[which(mergedColors=="green")])
#match the nodes 
names(green_eye_module)[1] <- "transcript_id"
green_eye_module_trinotate <- setDT(Trinotate_lym_subset_skogs, key = 'transcript_id')[J(green_eye_module)]
#write.csv( green_eye_module, file = "green_eye_module_Skogs.csv")
In [85]:
head(green_eye_module_trinotate)
A data.table: 6 × 17
#gene_idtranscript_idsprot_Top_BLASTX_hitRNAMMERprot_idprot_coordssprot_Top_BLASTP_hitPfamSignalPTmHMMeggnogKegggene_ontology_BLASTXgene_ontology_BLASTPgene_ontology_Pfamtranscriptpeptide
<chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr><chr>
TRINITY_DN101698_c0_g1TRINITY_DN101698_c0_g1_i1. .. . . . . ... . . ...
TRINITY_DN10174_c0_g2 TRINITY_DN10174_c0_g2_i10. .TRINITY_DN10174_c0_g2_i10.p11010-1480[+]. . . ... . . ...
TRINITY_DN10190_c3_g1 TRINITY_DN10190_c3_g1_i1 . .. . . . . ... . . ...
TRINITY_DN10265_c0_g2 TRINITY_DN10265_c0_g2_i1 OTUBL_DROME^OTUBL_DROME^Q:19-780,H:1-261^53.64%ID^E:1.15e-94^RecName: Full=Ubiquitin thioesterase otubain-like;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; Ephydroidea; Drosophilidae; Drosophila; Sophophora.TRINITY_DN10265_c0_g2_i1.p1 19-783[+] OTUBL_DROME^OTUBL_DROME^Q:1-254,H:1-261^53.64%ID^E:7.52e-99^RecName: Full=Ubiquitin thioesterase otubain-like;^Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; Ephydroidea; Drosophilidae; Drosophila; SophophoraPF10275.12^Peptidase_C65^Peptidase C65 Otubain^27-254^E:1.3e-77 . ..KEGG:dme:Dmel_CG4968GO:0005634^cellular_component^nucleus`GO:0004843^molecular_function^cysteine-type deubiquitinase activity`GO:0043130^molecular_function^ubiquitin binding`GO:0071108^biological_process^protein K48-linked deubiquitination GO:0005634^cellular_component^nucleus`GO:0004843^molecular_function^cysteine-type deubiquitinase activity`GO:0043130^molecular_function^ubiquitin binding`GO:0071108^biological_process^protein K48-linked deubiquitination ...
TRINITY_DN103043_c0_g1TRINITY_DN103043_c0_g1_i1. .. . . . . ... . . ...
TRINITY_DN10307_c0_g1 TRINITY_DN10307_c0_g1_i1 CNPY2_HUMAN^CNPY2_HUMAN^Q:832-371,H:20-172^35.714%ID^E:9.36e-31^RecName: Full=Protein canopy homolog 2;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo .TRINITY_DN10307_c0_g1_i1.p1 344-919[-] CNPY2_HUMAN^CNPY2_HUMAN^Q:30-183,H:20-172^35.714%ID^E:4.52e-32^RecName: Full=Protein canopy homolog 2;^Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo PF11938.11^DUF3456^TLR4 regulator and MIR-interacting MSAP^37-182^E:1.1e-38sigP:1^32^0.519..KEGG:hsa:10330 GO:0005783^cellular_component^endoplasmic reticulum`GO:0010629^biological_process^negative regulation of gene expression`GO:1905599^biological_process^positive regulation of low-density lipoprotein receptor activity`GO:0010988^biological_process^regulation of low-density lipoprotein particle clearanceGO:0005783^cellular_component^endoplasmic reticulum`GO:0010629^biological_process^negative regulation of gene expression`GO:1905599^biological_process^positive regulation of low-density lipoprotein receptor activity`GO:0010988^biological_process^regulation of low-density lipoprotein particle clearance...

Determine if BCN one-to-one orthologs are conserved in Skogsbergia sp. networks

To determine whether the BCN is conserved, a randomization test was performed to assess if 51 BCN orthologs, selected at random from all possible one-to-one expressed orthologs, are found in fewer modules than a random set of genes

In [89]:
#import the BCN one-to-one ortholog sheet 

BCN_one_to_one_orthologs_skogs <- read.csv("BCN_one_to_one_orthologs_skogs_ids_unlist_removep_unique.csv" ,header = TRUE, row.names=1, stringsAsFactors = FALSE)
In [90]:
head(BCN_one_to_one_orthologs_skogs)
A data.frame: 6 × 1
x
<chr>
1TRINITY_DN177_c0_g1_i4
2TRINITY_DN3729_c0_g1_i3
3TRINITY_DN21901_c0_g1_i1
4TRINITY_DN27140_c0_g1_i1
5TRINITY_DN671_c0_g1_i1
6TRINITY_DN4646_c0_g2_i1
In [91]:
colnames(BCN_one_to_one_orthologs_skogs)[1] <- "transcript_id"
In [92]:
#total of 79 one to one orthologs across V.tsujii and Skogsbergia sp. transcriptomes but only 51 transcripts are expressed in the networks of Skogsbergia sp. 
nrow(BCN_one_to_one_orthologs_skogs)
79
In [93]:
#import ALL the one-to-one orthologs  V.tsujii and Skogsbergia sp. transcriptomes.
In [94]:
ALL_one_to_one_orthologs_vtsujii_skogs <- read.csv("ALL_one_to_one_orthologs_vtsujii_skogs.csv" ,header = TRUE, row.names=1, stringsAsFactors = FALSE)
In [95]:
head(ALL_one_to_one_orthologs_vtsujii_skogs)
A data.frame: 6 × 3
Orthogrouptsujiiskogs
<chr><chr><chr>
1OG0000001NODE_34210_length_483_cov_3.08178_g27126_i0TRINITY_DN95414_c0_g1_i1
2OG0000001NODE_28302_length_632_cov_2.16118_g21539_i0TRINITY_DN15305_c0_g1_i1
3OG0000001NODE_48125_length_323_cov_1.49254_g40905_i0TRINITY_DN12206_c0_g2_i4
4OG0000001NODE_40629_length_390_cov_2.93731_g33431_i0TRINITY_DN18899_c0_g1_i1
5OG0000001NODE_31640_length_541_cov_1.62963_g24647_i0TRINITY_DN70069_c0_g1_i1
6OG0000001NODE_22914_length_844_cov_2.43599_g16758_i0TRINITY_DN22758_c4_g1_i1
In [96]:
#now lets just subset the Skogsbergia column for the randomization test 
ALL_one_to_one_orthologs_vtsujii_skogs_SUBSET_SKOGS <-  subset(ALL_one_to_one_orthologs_vtsujii_skogs, select ="skogs")
In [97]:
head(ALL_one_to_one_orthologs_vtsujii_skogs_SUBSET_SKOGS)
A data.frame: 6 × 1
skogs
<chr>
1TRINITY_DN95414_c0_g1_i1
2TRINITY_DN15305_c0_g1_i1
3TRINITY_DN12206_c0_g2_i4
4TRINITY_DN18899_c0_g1_i1
5TRINITY_DN70069_c0_g1_i1
6TRINITY_DN22758_c4_g1_i1
In [ ]:
#need to remove .p1 

ALL_one_to_one_orthologs_vtsujii_skogs_SUBSET_SKOGS_removep <- ALL_one_to_one_orthologs_vtsujii_skogs_SUBSET_SKOGS %>% separate(skogs, c("transcript_id", "extra"), sep ="\\.p")
In [99]:
ALL_one_to_one_orthologs_vtsujii_skogs_SUBSET_SKOGS_removep$extra <- NULL
In [100]:
ALL_one_to_one_orthologs_vtsujii_skogs_SUBSET_SKOGS_removep_rmdup <- ALL_one_to_one_orthologs_vtsujii_skogs_SUBSET_SKOGS_removep %>% distinct()
In [101]:
nrow(ALL_one_to_one_orthologs_vtsujii_skogs_SUBSET_SKOGS_removep_rmdup)
4942
In [102]:
# now subset this to include all transripts expressed in Skogsbergia sp. networks

dds_merged_table_prefiltered_wgcna_transcript_ids <- as.data.frame(rownames(dds_merged_table_prefiltered_wgcna))
In [103]:
colnames(dds_merged_table_prefiltered_wgcna_transcript_ids)[1] <- "transcript_id"
In [104]:
#now subset

ALL_one_to_one_orthologs_skogs_SUBSET_to_wgcna_expression_input  <- subset(ALL_one_to_one_orthologs_vtsujii_skogs_SUBSET_SKOGS_removep_rmdup, transcript_id %in% dds_merged_table_prefiltered_wgcna_transcript_ids$transcript_id)
In [105]:
head(ALL_one_to_one_orthologs_skogs_SUBSET_to_wgcna_expression_input)
A data.frame: 6 × 1
transcript_id
<chr>
8TRINITY_DN9569_c0_g2_i11
10TRINITY_DN73_c0_g1_i2
11TRINITY_DN56829_c0_g1_i1
12TRINITY_DN1643_c0_g1_i1
20TRINITY_DN56429_c0_g1_i1
26TRINITY_DN10424_c1_g2_i1
In [106]:
nrow(ALL_one_to_one_orthologs_skogs_SUBSET_to_wgcna_expression_input)
2653
In [107]:
#perform the randomization test
#total of 79 one to one orthologs across V.tsujii and Skogsbergia sp. transcriptomes but only 51 transcripts are expressed in the networks of Skogsbergia sp. 
ALL_one_to_one_orthologs_skogs_df_repeat10000 <- replicate(10000, sample(ALL_one_to_one_orthologs_skogs_SUBSET_to_wgcna_expression_input$transcript_id, size = 51, replace = FALSE))
In [108]:
head(ALL_one_to_one_orthologs_skogs_df_repeat10000)
A matrix: 6 × 10000 of type chr
TRINITY_DN14811_c0_g2_i4TRINITY_DN19387_c0_g1_i1TRINITY_DN6409_c1_g2_i1 TRINITY_DN3067_c0_g1_i1 TRINITY_DN76446_c0_g1_i1TRINITY_DN8168_c0_g1_i3 TRINITY_DN7495_c0_g2_i1 TRINITY_DN4926_c0_g1_i1 TRINITY_DN24054_c0_g1_i2TRINITY_DN346_c0_g1_i10 TRINITY_DN17668_c0_g1_i1TRINITY_DN94358_c0_g1_i1TRINITY_DN4757_c0_g1_i1 TRINITY_DN1277_c0_g1_i1 TRINITY_DN1207_c2_g1_i7 TRINITY_DN6841_c0_g4_i1 TRINITY_DN3697_c0_g1_i1 TRINITY_DN537_c1_g1_i3 TRINITY_DN52483_c0_g1_i1TRINITY_DN2541_c0_g1_i2
TRINITY_DN18173_c1_g2_i1TRINITY_DN9475_c0_g2_i3 TRINITY_DN19505_c0_g1_i2TRINITY_DN871_c1_g1_i1 TRINITY_DN8798_c0_g2_i1 TRINITY_DN49829_c0_g1_i1TRINITY_DN88612_c0_g1_i1TRINITY_DN22292_c1_g1_i1TRINITY_DN2512_c0_g1_i7 TRINITY_DN78344_c0_g1_i1TRINITY_DN2560_c0_g1_i6 TRINITY_DN188_c0_g1_i1 TRINITY_DN1091_c5_g1_i1 TRINITY_DN50396_c0_g1_i1TRINITY_DN3027_c1_g1_i1 TRINITY_DN5552_c0_g3_i1 TRINITY_DN315_c0_g2_i1 TRINITY_DN56244_c0_g1_i1TRINITY_DN8515_c0_g1_i2 TRINITY_DN18120_c0_g1_i2
TRINITY_DN86044_c0_g1_i1TRINITY_DN159_c0_g1_i5 TRINITY_DN4882_c0_g1_i2 TRINITY_DN4812_c0_g1_i1 TRINITY_DN668_c1_g1_i1 TRINITY_DN26170_c0_g1_i2TRINITY_DN3586_c0_g1_i2 TRINITY_DN17231_c0_g1_i1TRINITY_DN6806_c1_g1_i4 TRINITY_DN2343_c0_g1_i1 TRINITY_DN14205_c0_g2_i4TRINITY_DN5257_c0_g2_i3 TRINITY_DN2285_c0_g1_i1 TRINITY_DN9573_c0_g4_i1 TRINITY_DN4926_c0_g1_i1 TRINITY_DN4573_c0_g1_i2 TRINITY_DN2373_c0_g1_i1 TRINITY_DN88412_c0_g1_i1TRINITY_DN2368_c0_g3_i1 TRINITY_DN34621_c0_g1_i1
TRINITY_DN3089_c1_g1_i2 TRINITY_DN18054_c0_g3_i2TRINITY_DN33258_c0_g1_i1TRINITY_DN26828_c0_g1_i1TRINITY_DN21370_c0_g1_i1TRINITY_DN87_c0_g3_i1 TRINITY_DN76_c0_g1_i1 TRINITY_DN4438_c0_g1_i10TRINITY_DN5476_c0_g1_i6 TRINITY_DN19694_c0_g1_i2TRINITY_DN13695_c0_g1_i2TRINITY_DN7495_c0_g2_i1 TRINITY_DN21117_c0_g1_i1TRINITY_DN2115_c0_g1_i2 TRINITY_DN21213_c0_g1_i1TRINITY_DN5832_c0_g1_i1 TRINITY_DN42008_c0_g1_i1TRINITY_DN13849_c0_g1_i1TRINITY_DN7202_c0_g1_i1 TRINITY_DN3586_c1_g3_i2
TRINITY_DN28789_c0_g1_i3TRINITY_DN10639_c2_g1_i1TRINITY_DN19170_c0_g1_i1TRINITY_DN20601_c0_g1_i1TRINITY_DN14161_c0_g1_i2TRINITY_DN5116_c1_g1_i1 TRINITY_DN2782_c0_g1_i1 TRINITY_DN20555_c0_g1_i1TRINITY_DN10190_c1_g1_i4TRINITY_DN278_c1_g1_i2 TRINITY_DN12950_c0_g1_i6TRINITY_DN23452_c0_g1_i2TRINITY_DN77121_c0_g1_i1TRINITY_DN2368_c0_g3_i1 TRINITY_DN10814_c0_g1_i1TRINITY_DN10_c1_g1_i1 TRINITY_DN77125_c0_g1_i1TRINITY_DN97832_c0_g1_i1TRINITY_DN33303_c0_g1_i2TRINITY_DN31990_c0_g1_i1
TRINITY_DN4794_c0_g1_i1 TRINITY_DN15266_c0_g1_i1TRINITY_DN2511_c0_g1_i5 TRINITY_DN16744_c0_g2_i3TRINITY_DN76032_c0_g1_i1TRINITY_DN12545_c1_g1_i1TRINITY_DN5892_c3_g1_i1 TRINITY_DN9583_c0_g1_i2 TRINITY_DN33980_c0_g1_i1TRINITY_DN2866_c0_g1_i1 TRINITY_DN76169_c0_g1_i1TRINITY_DN10600_c0_g1_i1TRINITY_DN13_c0_g1_i1 TRINITY_DN10688_c0_g2_i6TRINITY_DN55_c0_g1_i1 TRINITY_DN41839_c0_g1_i3TRINITY_DN3974_c0_g1_i1 TRINITY_DN1410_c0_g1_i1 TRINITY_DN6713_c1_g1_i3 TRINITY_DN13830_c0_g1_i1
In [121]:
#function to extract the number of Skogsbergia sp. modules in which one-to-one BCN orthologs are found for the randomization test

matrix_1 <- c()
matrix_2 <- c()
matrix_3 <-c()


for(i in 1:ncol(ALL_one_to_one_orthologs_skogs_df_repeat10000 )){
matrix_1[[i]] <- ALL_one_to_one_orthologs_skogs_df_repeat10000 [ ,i]
matrix_2[[i]] <- transcripts_to_module %>% filter(transcript_id %in% matrix_1[[i]])
matrix_3[[i]]<- length(unique(matrix_2[[i]]$colors))

}


print(matrix_3)
    [1] 19 23 23 21 19 23 21 18 23 25 19 29 21 23 23 17 21 25 21 20 18 18 26 16
   [25] 27 26 22 23 21 26 22 23 23 21 21 22 22 23 22 24 21 23 19 24 25 23 18 24
   [49] 20 20 24 22 21 25 24 20 22 22 24 22 18 22 19 24 24 20 27 22 25 26 22 17
   [73] 21 21 20 19 24 26 27 22 25 17 25 23 20 23 21 23 22 25 24 23 23 20 19 22
   [97] 20 26 22 27 20 20 24 24 24 21 24 24 21 25 17 26 21 24 22 26 22 20 24 25
  [121] 20 24 25 21 23 27 22 23 20 24 24 21 22 21 19 23 23 24 25 22 24 19 20 20
  [145] 17 20 17 17 21 26 25 24 25 22 20 20 24 19 25 23 19 21 23 20 23 21 19 25
  [169] 21 20 21 20 22 21 24 21 21 23 20 22 22 24 20 25 16 29 23 24 18 18 25 21
  [193] 26 21 20 20 23 22 23 22 22 25 21 20 24 21 20 21 29 19 22 22 22 21 22 20
  [217] 25 20 20 21 28 19 26 23 24 21 25 22 26 26 22 22 20 22 20 21 26 22 26 21
  [241] 25 24 23 20 24 24 22 22 18 23 23 25 24 20 20 24 25 25 24 21 22 21 25 23
  [265] 23 20 23 24 22 21 20 23 19 24 25 24 25 21 20 26 20 21 20 20 20 23 20 23
  [289] 24 23 20 24 20 21 24 24 20 21 21 23 19 21 21 23 21 23 21 26 25 20 23 23
  [313] 19 24 25 25 24 20 25 21 24 18 23 22 20 26 19 23 24 24 22 23 20 20 25 29
  [337] 23 21 21 24 21 21 21 23 24 27 23 22 23 19 22 18 23 25 26 23 23 20 25 20
  [361] 24 24 24 23 23 22 22 22 25 20 22 25 19 23 26 24 21 24 21 23 22 24 22 20
  [385] 22 22 22 28 28 17 18 20 24 25 21 23 20 23 22 24 24 22 22 18 22 22 23 14
  [409] 20 20 22 19 22 21 27 24 19 21 23 27 23 21 24 19 21 25 20 25 26 24 22 22
  [433] 21 19 23 24 24 21 23 23 19 17 23 22 21 21 19 23 20 16 23 21 21 16 20 21
  [457] 25 23 26 22 23 21 22 19 18 20 25 19 20 24 24 18 25 20 25 20 20 22 24 25
  [481] 19 23 20 21 27 26 20 24 30 20 23 22 25 19 23 20 21 23 21 22 24 25 18 23
  [505] 23 20 16 21 25 20 24 22 22 20 18 23 23 21 25 25 20 18 19 23 22 21 24 22
  [529] 23 23 24 18 19 23 21 25 26 20 21 21 20 20 27 20 21 26 22 26 22 29 21 26
  [553] 17 24 25 22 22 22 22 20 20 23 24 23 26 18 21 23 24 20 23 20 24 26 21 27
  [577] 20 24 21 21 23 22 26 20 19 21 19 24 26 25 22 20 23 24 15 24 23 20 23 20
  [601] 21 24 18 19 21 23 23 28 23 20 22 23 24 24 20 24 20 20 19 26 21 25 26 20
  [625] 24 24 21 19 27 19 21 20 24 21 20 19 21 24 27 23 25 17 25 23 19 20 23 21
  [649] 27 24 22 25 22 25 25 19 23 26 22 22 23 23 21 22 21 23 19 18 22 24 20 20
  [673] 20 26 25 26 19 22 22 25 20 20 23 25 22 25 25 23 21 19 23 23 22 20 23 22
  [697] 24 22 22 24 23 21 21 23 20 16 24 21 22 24 21 19 26 22 21 20 23 21 17 19
  [721] 25 23 23 23 19 21 25 23 19 23 20 22 25 19 20 20 24 21 19 20 24 21 22 21
  [745] 24 24 21 22 22 20 21 23 23 24 25 17 15 22 24 26 22 20 21 20 21 24 20 22
  [769] 22 22 22 22 23 20 22 22 22 22 26 25 21 26 23 19 19 18 19 22 21 20 22 23
  [793] 22 23 21 20 23 21 22 25 21 26 23 20 24 23 22 24 21 22 22 24 21 26 21 23
  [817] 20 21 20 23 19 22 23 25 23 25 23 21 23 26 25 23 21 22 23 24 23 20 25 23
  [841] 24 18 20 22 21 20 23 20 25 27 24 21 23 21 24 20 19 21 21 22 20 21 24 19
  [865] 19 21 23 22 17 22 21 22 23 21 20 22 24 26 18 23 21 24 23 20 19 23 21 19
  [889] 24 24 20 22 21 23 19 21 23 23 22 23 20 21 23 21 20 20 22 27 18 21 21 22
  [913] 24 19 21 21 23 24 21 22 22 22 21 22 19 20 19 23 19 21 18 22 22 23 21 23
  [937] 25 25 28 23 19 21 18 25 24 22 22 21 21 22 23 24 20 20 21 22 20 18 23 21
  [961] 22 22 25 20 22 21 17 21 22 25 27 19 20 21 18 24 25 19 22 20 20 23 21 25
  [985] 21 22 26 20 20 20 25 20 25 25 25 22 18 24 25 21 20 20 24 22 20 24 20 20
 [1009] 21 21 20 24 23 20 26 25 21 23 20 26 21 24 25 21 21 19 26 24 22 18 25 19
 [1033] 27 19 24 24 22 22 21 24 20 23 22 20 20 23 22 24 27 21 23 18 22 22 21 25
 [1057] 21 22 25 22 25 22 18 20 21 24 24 20 21 21 18 27 23 19 24 23 25 23 22 23
 [1081] 22 21 23 22 23 21 23 23 22 23 27 23 23 21 22 26 19 25 23 21 25 22 25 20
 [1105] 18 19 19 26 19 19 20 21 25 25 21 24 24 19 24 24 23 23 20 22 25 20 23 26
 [1129] 24 21 26 21 20 24 24 21 22 24 26 28 24 20 20 22 21 21 24 25 24 23 19 21
 [1153] 23 21 25 24 21 18 23 24 20 22 22 23 20 23 22 26 24 21 25 21 28 22 24 20
 [1177] 24 21 22 22 20 21 23 22 25 23 24 25 24 21 20 20 21 23 24 24 21 19 23 20
 [1201] 21 25 24 23 22 19 21 23 21 20 20 22 23 21 24 25 21 20 25 20 25 21 20 24
 [1225] 19 21 27 25 26 23 20 24 23 21 22 23 23 19 23 25 23 25 21 23 21 24 21 22
 [1249] 20 23 19 22 20 18 20 24 24 22 24 24 22 22 25 22 23 29 23 22 22 16 26 19
 [1273] 20 23 21 27 22 24 22 21 25 20 26 24 25 18 20 23 18 24 22 22 26 25 21 25
 [1297] 24 23 22 22 22 21 25 23 24 25 22 25 24 19 27 19 20 19 23 21 25 25 20 26
 [1321] 19 24 22 23 24 21 26 22 17 23 24 24 23 22 21 21 20 19 20 24 22 24 23 21
 [1345] 23 22 23 23 24 22 18 20 21 28 25 21 22 21 22 21 27 21 24 25 23 22 22 17
 [1369] 19 19 23 20 24 21 26 23 24 26 20 26 21 21 21 25 21 19 22 26 22 23 16 23
 [1393] 23 22 23 20 25 19 27 23 26 23 22 24 21 21 21 24 22 18 21 25 22 22 25 22
 [1417] 19 26 19 23 23 20 23 22 22 20 25 23 23 21 25 19 24 21 21 24 18 18 24 21
 [1441] 23 21 22 25 20 22 19 18 24 23 23 25 19 21 21 27 22 24 18 20 22 18 22 25
 [1465] 22 19 21 19 22 23 25 27 18 26 25 24 22 20 25 27 22 25 23 23 21 22 21 23
 [1489] 19 22 24 27 25 25 23 24 24 18 25 27 19 20 20 24 19 22 20 18 19 23 20 23
 [1513] 22 23 25 22 22 20 21 19 20 20 20 20 19 25 25 23 24 22 23 20 25 20 17 20
 [1537] 18 22 23 26 17 23 22 23 21 19 20 20 25 23 20 20 20 21 24 25 20 22 19 20
 [1561] 23 23 21 26 23 23 22 21 21 26 21 19 22 19 23 21 22 22 22 25 22 22 21 25
 [1585] 23 22 23 20 23 17 21 22 21 20 24 18 26 23 22 24 24 19 22 22 24 27 20 24
 [1609] 22 24 24 24 19 20 23 14 21 22 26 22 23 27 25 22 22 25 22 22 24 23 24 20
 [1633] 21 19 24 23 23 22 20 20 26 23 23 21 21 23 17 26 26 23 21 20 19 21 21 24
 [1657] 21 24 24 17 22 24 27 21 23 23 18 26 25 19 26 23 20 22 21 24 24 20 26 19
 [1681] 24 22 25 21 23 18 26 20 18 20 20 21 23 22 23 21 19 23 21 17 21 23 19 21
 [1705] 22 23 24 18 24 19 19 21 23 19 20 21 20 24 19 24 20 23 24 22 22 21 23 25
 [1729] 19 25 21 23 24 23 19 22 22 24 23 21 22 24 20 23 23 22 21 21 18 21 21 23
 [1753] 21 22 19 19 23 19 24 26 24 22 21 18 18 22 21 21 23 20 26 22 20 25 19 23
 [1777] 19 22 25 21 23 25 27 22 23 19 21 22 25 20 23 23 22 24 24 24 20 22 24 26
 [1801] 24 21 27 20 24 24 23 18 24 24 23 23 21 24 23 23 24 19 22 22 22 28 22 20
 [1825] 24 22 23 20 25 27 22 26 22 26 23 21 23 20 23 20 20 21 20 22 22 25 22 20
 [1849] 21 19 21 20 26 21 23 20 18 27 24 21 23 23 22 27 22 22 23 25 22 23 21 21
 [1873] 23 25 26 20 22 26 21 25 21 20 23 22 22 24 24 23 22 21 26 24 22 20 21 20
 [1897] 23 20 26 16 23 22 27 21 26 20 25 23 28 23 23 21 18 26 23 24 19 20 22 19
 [1921] 22 22 22 21 18 21 26 20 22 22 23 22 22 18 24 23 17 23 23 17 24 24 25 25
 [1945] 25 20 24 23 18 19 21 24 18 24 24 22 20 21 22 22 21 22 25 24 21 22 21 20
 [1969] 23 25 24 28 20 19 21 21 23 24 26 17 23 23 24 21 21 22 23 23 20 22 25 26
 [1993] 23 19 22 22 20 17 22 21 21 22 23 24 28 23 21 26 19 21 22 20 21 21 21 24
 [2017] 23 25 21 25 19 25 25 22 24 18 26 21 20 21 23 24 20 21 17 22 23 27 23 23
 [2041] 21 21 22 20 21 20 23 19 23 23 18 22 22 26 18 27 19 17 24 20 21 24 21 20
 [2065] 26 18 20 19 20 23 18 20 23 19 23 22 17 22 20 21 25 24 24 22 22 21 22 20
 [2089] 23 24 24 21 20 24 20 24 21 25 25 22 21 18 23 24 18 22 26 22 18 20 23 22
 [2113] 24 25 22 25 23 25 22 24 20 21 20 20 20 23 19 21 20 21 19 25 20 25 23 23
 [2137] 24 24 20 22 25 23 20 18 25 26 18 22 25 21 25 23 24 24 19 21 21 24 24 22
 [2161] 20 21 24 22 24 23 19 22 21 23 24 25 26 21 25 24 21 23 24 20 26 25 22 23
 [2185] 23 25 20 22 22 16 22 24 22 24 22 22 22 21 23 22 22 21 21 22 22 26 28 21
 [2209] 23 23 26 20 25 23 18 20 24 24 17 21 22 18 20 25 19 20 26 26 26 20 20 23
 [2233] 23 22 22 22 19 26 22 21 24 24 20 26 20 26 21 21 22 20 25 22 22 24 22 24
 [2257] 20 25 23 22 24 21 23 23 21 19 23 17 22 24 22 18 20 23 24 23 19 22 20 19
 [2281] 21 21 23 24 23 18 23 27 24 19 20 23 19 23 23 22 19 26 23 22 24 22 26 24
 [2305] 23 23 25 19 24 22 20 18 19 23 22 20 22 24 23 22 19 19 18 20 19 22 22 25
 [2329] 22 22 22 22 20 25 19 22 24 24 18 24 22 22 24 20 24 20 20 23 25 26 24 20
 [2353] 21 25 21 23 19 25 23 25 14 20 22 20 25 19 19 26 19 25 25 18 20 21 21 22
 [2377] 20 18 22 19 21 28 21 23 26 22 25 26 25 23 19 19 26 20 22 22 21 22 22 23
 [2401] 26 20 19 23 25 21 24 19 15 24 20 24 21 20 20 19 23 21 25 24 24 21 19 20
 [2425] 20 24 21 22 24 23 24 15 18 19 21 25 22 25 25 18 25 24 20 23 24 22 20 20
 [2449] 24 18 24 18 22 23 21 23 22 20 23 24 24 24 27 23 24 17 23 20 20 23 20 24
 [2473] 23 20 23 23 17 18 19 25 19 19 22 24 21 21 24 23 23 23 22 23 14 20 27 28
 [2497] 21 25 22 26 24 22 24 22 19 24 21 23 24 22 22 25 22 25 23 23 21 21 22 23
 [2521] 22 20 20 23 20 23 23 23 23 24 25 18 28 25 24 24 21 20 22 24 22 25 21 22
 [2545] 21 21 23 20 23 19 20 24 22 22 25 21 23 26 22 22 18 22 20 26 21 22 23 25
 [2569] 20 21 23 23 24 22 25 22 24 25 20 23 23 23 24 24 20 22 22 24 22 22 22 18
 [2593] 22 22 23 23 21 23 22 20 19 27 23 25 21 21 22 23 25 22 24 23 26 18 21 21
 [2617] 24 25 25 21 26 23 20 20 17 21 23 25 24 21 22 25 23 23 23 26 25 18 22 23
 [2641] 21 20 26 22 23 20 23 22 24 22 24 20 25 25 24 22 22 21 20 19 24 23 21 20
 [2665] 22 23 21 26 24 22 23 25 18 21 24 24 24 25 20 23 22 19 20 24 20 21 24 20
 [2689] 21 20 22 22 23 21 24 20 21 21 20 24 22 22 23 25 22 20 20 22 21 23 21 23
 [2713] 26 22 23 22 22 23 21 21 18 21 24 22 21 25 21 24 25 26 23 16 26 26 20 23
 [2737] 22 28 21 22 22 20 21 22 21 22 22 21 23 22 26 20 17 18 21 19 27 20 20 20
 [2761] 21 25 21 25 28 21 23 23 19 16 23 24 25 19 22 29 17 21 22 24 25 19 21 21
 [2785] 22 21 22 19 21 19 21 21 21 19 19 22 25 26 25 22 18 23 21 23 20 21 22 24
 [2809] 25 21 20 20 24 22 20 23 20 21 24 25 24 21 21 18 19 22 24 20 23 22 19 23
 [2833] 16 19 21 21 21 21 19 23 26 23 19 24 26 22 24 26 26 22 21 18 20 21 21 18
 [2857] 22 26 24 23 19 23 18 25 21 26 18 23 20 20 19 22 24 24 20 24 22 21 23 19
 [2881] 21 24 22 18 18 22 24 20 24 21 21 25 20 22 20 21 24 24 24 23 22 19 22 22
 [2905] 25 20 21 19 24 22 21 20 24 20 23 13 19 22 21 18 24 20 26 21 24 23 25 22
 [2929] 22 24 21 21 25 24 19 24 21 21 23 20 18 21 22 18 21 21 25 24 22 22 24 18
 [2953] 22 24 23 22 23 19 20 20 25 19 20 23 26 28 24 22 25 20 23 23 21 19 22 24
 [2977] 22 21 21 25 26 20 23 20 17 19 20 24 23 22 21 21 20 24 20 17 25 24 20 25
 [3001] 23 21 22 19 24 25 24 21 19 26 17 21 26 22 21 21 23 21 24 23 15 21 22 18
 [3025] 23 24 23 22 21 22 21 19 22 21 25 21 21 23 22 21 26 25 23 22 21 22 19 25
 [3049] 23 20 22 27 20 23 25 21 26 22 22 21 22 27 23 23 24 21 20 21 22 26 23 25
 [3073] 23 22 22 19 25 24 22 26 25 22 24 21 22 21 22 19 24 23 20 25 18 19 22 23
 [3097] 20 19 24 20 26 20 23 24 20 24 22 23 23 19 23 22 20 22 19 22 23 20 20 22
 [3121] 22 24 21 21 21 23 25 22 20 23 20 24 24 23 20 25 24 18 20 24 21 23 19 21
 [3145] 22 23 22 26 22 23 22 20 20 25 23 21 22 21 24 25 24 20 19 21 23 22 23 22
 [3169] 23 23 19 23 24 21 20 23 22 19 20 21 20 20 23 23 23 19 22 23 23 26 25 23
 [3193] 25 21 26 15 25 18 21 21 24 23 20 20 17 23 25 20 19 22 21 21 20 18 21 22
 [3217] 25 26 23 23 24 18 22 23 23 23 22 23 23 25 22 25 24 23 23 26 21 20 16 23
 [3241] 26 22 23 20 19 24 23 23 21 23 20 21 24 22 21 21 22 26 19 21 23 21 19 21
 [3265] 24 26 16 25 18 22 24 23 24 23 19 21 22 24 21 25 19 20 26 17 21 20 20 20
 [3289] 20 23 24 22 20 23 24 26 22 21 25 25 24 19 22 22 19 23 21 23 23 22 18 22
 [3313] 23 23 18 25 24 20 22 17 21 20 22 22 22 23 27 17 20 24 23 19 24 23 24 27
 [3337] 23 22 23 23 21 22 24 23 23 22 26 19 21 20 23 21 24 19 21 25 23 25 19 21
 [3361] 24 22 21 24 31 25 22 21 24 23 24 24 23 24 20 25 26 24 20 22 23 23 19 24
 [3385] 22 25 17 20 25 22 24 23 23 22 22 22 22 19 21 22 24 24 26 21 20 20 24 20
 [3409] 23 22 20 23 21 24 20 22 26 22 22 22 20 21 24 23 26 27 23 26 18 23 21 21
 [3433] 25 24 19 25 25 21 24 22 22 22 24 19 17 20 20 21 23 23 22 23 22 23 22 24
 [3457] 26 23 17 22 22 24 21 23 20 22 16 21 22 21 22 26 19 21 22 18 19 28 26 21
 [3481] 20 20 24 26 25 23 17 19 28 24 20 20 21 23 20 21 27 21 21 18 21 23 21 19
 [3505] 22 25 19 22 22 21 23 18 22 21 20 21 25 25 21 24 20 19 19 25 20 21 23 20
 [3529] 22 21 21 22 22 20 20 22 21 23 22 24 21 21 21 18 20 20 20 21 23 21 18 18
 [3553] 19 19 25 21 21 19 26 21 20 26 24 21 23 20 18 22 21 25 22 20 21 22 21 18
 [3577] 24 21 25 25 24 22 20 21 25 21 23 23 22 25 23 23 19 23 23 20 21 23 19 20
 [3601] 23 24 23 21 24 22 25 25 21 22 19 22 20 19 24 23 23 21 21 22 21 21 26 24
 [3625] 23 21 21 21 27 22 20 25 22 20 26 21 21 19 25 23 18 19 22 22 21 20 24 19
 [3649] 28 22 25 21 21 20 19 22 19 19 21 24 23 21 20 21 22 22 24 21 17 23 23 21
 [3673] 23 23 27 21 22 22 21 23 26 19 22 24 19 21 25 22 21 22 25 26 19 20 19 19
 [3697] 21 18 19 23 23 23 26 23 25 19 20 24 23 22 19 18 20 22 22 20 24 23 19 24
 [3721] 23 25 23 18 25 20 22 22 22 19 24 20 22 19 25 19 20 20 20 25 18 25 24 22
 [3745] 23 19 23 23 19 26 21 21 21 23 22 19 24 24 22 22 21 22 21 21 19 20 23 20
 [3769] 21 20 19 22 21 20 23 17 21 18 18 18 23 20 20 26 22 24 22 20 21 20 22 25
 [3793] 24 22 23 23 21 18 21 23 24 23 20 24 25 20 22 26 18 18 22 20 22 22 21 21
 [3817] 21 23 18 19 23 20 24 26 24 22 20 23 21 22 22 22 19 23 22 21 24 19 24 22
 [3841] 22 22 22 23 20 21 22 26 17 25 22 26 23 23 21 20 19 21 26 24 26 23 23 19
 [3865] 26 24 20 23 21 22 24 21 21 22 24 22 24 24 22 23 23 20 19 23 23 23 25 21
 [3889] 24 19 23 19 22 21 24 26 23 20 17 21 25 22 27 22 26 23 24 23 26 22 21 21
 [3913] 20 20 25 23 24 23 17 26 22 23 22 22 23 24 24 17 23 23 22 22 25 23 23 24
 [3937] 18 27 22 24 24 24 21 18 23 22 22 20 23 21 21 21 26 24 25 18 25 21 26 23
 [3961] 22 21 21 21 27 20 24 22 21 21 22 22 23 20 24 21 25 22 23 25 22 22 20 24
 [3985] 23 26 20 22 20 27 25 26 21 22 19 20 28 22 20 24 20 24 22 20 22 22 20 27
 [4009] 23 23 23 20 19 19 19 21 21 18 23 19 22 25 19 26 21 21 20 23 21 25 21 22
 [4033] 21 21 22 22 21 23 26 23 23 23 29 26 18 24 21 25 20 20 26 25 18 23 25 20
 [4057] 24 22 20 21 23 19 20 23 22 23 24 21 23 23 17 20 21 22 21 26 22 24 21 23
 [4081] 20 24 18 18 24 22 20 24 21 20 24 24 24 22 23 20 19 24 26 20 21 22 24 22
 [4105] 26 25 20 20 25 19 24 24 15 19 26 20 23 26 26 22 25 22 24 23 21 25 20 22
 [4129] 21 22 19 24 22 21 21 19 21 23 22 24 25 23 22 25 18 23 22 23 27 23 27 23
 [4153] 18 22 18 21 22 19 26 21 21 22 21 23 23 21 19 17 21 20 18 22 22 18 20 22
 [4177] 24 21 22 22 21 25 22 23 30 21 22 22 22 18 21 23 20 19 20 22 24 23 22 19
 [4201] 24 16 23 22 21 22 23 22 21 21 24 21 18 21 21 20 24 23 23 25 25 22 17 24
 [4225] 21 21 21 23 20 20 26 21 23 19 21 22 22 24 21 22 22 23 23 22 19 20 27 21
 [4249] 20 26 21 25 23 19 20 20 25 21 21 24 21 22 23 23 18 21 25 18 22 22 24 24
 [4273] 22 23 22 19 23 23 21 20 26 21 19 20 21 25 20 22 26 20 19 22 23 22 18 22
 [4297] 23 21 24 25 23 19 24 22 22 25 25 23 21 23 19 20 26 22 22 19 19 24 21 23
 [4321] 17 21 23 24 20 24 23 23 22 23 21 21 22 20 23 24 25 20 24 21 18 22 21 25
 [4345] 23 20 21 18 21 19 22 17 21 21 24 22 23 21 24 23 19 21 18 24 23 18 20 22
 [4369] 20 24 22 20 22 18 19 22 19 22 24 19 21 25 27 18 21 19 21 25 24 21 23 27
 [4393] 19 23 23 22 21 20 18 19 24 23 19 24 23 23 22 23 20 21 16 22 20 21 23 23
 [4417] 22 26 23 16 24 27 26 19 19 23 22 17 23 22 23 26 25 19 18 23 17 22 25 23
 [4441] 22 23 22 25 21 21 21 22 22 21 23 24 21 19 22 25 25 22 24 19 18 22 16 21
 [4465] 21 24 18 23 20 19 22 21 24 21 21 20 19 23 18 24 22 17 22 23 21 22 22 27
 [4489] 19 19 23 25 21 22 23 27 23 26 25 24 21 26 24 22 20 23 25 23 22 23 23 22
 [4513] 22 22 23 23 21 22 18 23 23 19 24 18 27 24 28 21 21 25 20 18 25 20 23 20
 [4537] 25 24 25 21 21 23 18 20 21 23 18 24 23 20 24 25 24 23 21 23 23 23 19 24
 [4561] 23 20 21 22 20 21 22 21 22 23 23 20 23 20 23 20 25 18 24 24 20 23 23 19
 [4585] 23 26 20 22 27 24 22 25 24 23 23 21 20 21 24 25 21 20 21 20 19 22 25 24
 [4609] 22 23 19 21 22 24 21 20 23 21 20 21 24 25 21 22 24 19 25 24 18 19 23 23
 [4633] 25 24 21 20 19 26 21 21 22 27 21 24 24 19 23 25 19 23 22 19 23 16 21 20
 [4657] 24 26 23 23 23 17 23 21 18 24 15 23 25 22 25 21 26 22 23 26 23 25 22 21
 [4681] 25 20 22 21 23 22 19 22 21 23 24 19 25 26 23 23 24 23 18 19 22 22 24 22
 [4705] 24 22 21 23 23 23 21 22 20 25 24 21 21 22 22 23 24 20 25 23 20 24 24 21
 [4729] 24 21 22 20 20 20 22 21 19 18 23 26 21 21 20 23 24 25 17 18 19 22 25 21
 [4753] 21 28 18 25 18 25 22 24 23 24 21 22 22 21 22 19 26 18 27 23 24 22 20 23
 [4777] 21 24 23 27 22 22 22 19 22 21 24 22 20 23 21 22 19 20 24 22 22 24 23 26
 [4801] 20 21 20 24 22 25 23 22 22 23 21 27 22 21 16 26 25 23 24 23 22 24 19 24
 [4825] 22 20 21 22 23 22 25 20 22 23 21 21 22 21 19 23 19 26 21 22 23 20 21 19
 [4849] 21 23 24 27 27 20 21 25 25 18 20 22 22 25 19 21 20 20 22 23 25 23 22 22
 [4873] 24 22 25 21 19 25 20 24 23 24 26 26 22 19 23 20 21 23 23 25 19 24 22 23
 [4897] 21 17 22 19 22 24 25 19 23 26 21 25 22 22 26 25 21 26 25 25 25 23 21 26
 [4921] 20 21 24 21 19 18 25 28 22 25 20 21 23 25 24 23 20 23 24 19 21 23 20 25
 [4945] 19 20 25 22 22 21 23 22 24 22 23 20 24 23 24 23 23 22 22 21 25 19 26 23
 [4969] 17 24 17 23 22 24 23 24 20 23 19 23 16 28 23 23 19 18 21 22 19 19 21 18
 [4993] 22 23 24 19 23 19 24 28 23 24 23 21 25 26 25 23 21 25 17 18 23 21 24 21
 [5017] 20 19 24 22 22 23 21 24 21 22 22 19 19 20 22 23 21 23 22 25 23 23 21 19
 [5041] 20 24 25 18 24 21 22 19 24 25 23 24 20 22 22 22 23 22 21 21 20 21 21 22
 [5065] 23 20 21 21 24 20 21 24 24 20 20 19 20 17 24 18 22 22 21 24 26 21 25 18
 [5089] 22 24 24 28 18 21 22 19 22 20 23 22 21 23 22 23 24 20 26 22 23 21 18 23
 [5113] 24 23 18 20 19 20 24 24 21 25 25 21 20 20 23 25 20 21 22 24 21 22 26 22
 [5137] 23 23 20 22 24 23 20 20 24 26 25 22 22 21 22 25 19 17 16 20 21 22 20 25
 [5161] 19 21 24 20 20 23 25 20 20 26 22 18 23 20 27 20 22 20 22 22 21 22 22 27
 [5185] 22 21 23 20 24 25 22 19 21 22 18 23 23 22 21 20 25 20 22 22 21 20 20 24
 [5209] 21 19 25 21 24 25 21 23 22 22 20 23 27 26 19 21 19 20 19 25 22 22 24 24
 [5233] 26 22 21 20 26 22 27 25 22 24 19 25 24 24 22 20 19 23 22 20 24 19 21 25
 [5257] 21 24 20 22 25 25 23 20 18 22 22 24 22 20 19 21 26 20 25 23 23 20 25 27
 [5281] 28 23 25 22 20 22 23 21 21 23 24 28 20 21 23 18 21 21 21 22 24 22 24 21
 [5305] 24 20 25 22 19 23 22 24 21 22 18 21 21 18 22 19 22 20 21 23 19 21 22 23
 [5329] 20 22 21 25 25 18 22 21 20 18 22 20 21 18 24 17 23 23 23 23 21 25 19 21
 [5353] 22 25 21 20 25 23 23 22 26 18 22 21 23 19 18 17 20 22 25 23 24 23 22 22
 [5377] 25 18 27 22 18 21 21 23 23 23 21 20 21 21 22 19 20 23 18 20 22 23 18 21
 [5401] 21 22 20 23 20 26 18 23 25 24 22 25 24 25 25 24 19 24 21 19 19 26 16 24
 [5425] 19 25 22 23 23 20 20 24 20 20 20 22 23 19 21 20 21 24 26 22 23 23 22 22
 [5449] 23 24 19 20 21 22 23 27 19 23 23 25 23 25 22 21 20 20 23 20 24 21 25 25
 [5473] 20 21 23 25 20 21 24 22 18 23 21 25 23 24 24 22 21 24 25 22 24 25 23 21
 [5497] 22 22 23 22 21 23 21 22 24 24 23 21 21 22 23 23 24 24 26 22 22 21 28 19
 [5521] 18 25 17 22 22 23 23 19 19 22 29 20 23 22 23 18 24 20 23 23 21 17 19 23
 [5545] 16 20 22 20 22 19 21 22 26 26 25 20 19 22 22 19 22 27 21 22 25 17 23 20
 [5569] 25 23 20 19 23 21 21 21 26 21 21 24 21 20 20 21 24 22 22 21 22 20 17 22
 [5593] 25 22 25 22 18 25 22 22 20 23 23 23 16 24 20 22 22 21 22 26 19 19 24 22
 [5617] 23 23 27 23 25 25 21 23 19 22 25 21 24 23 21 18 22 25 18 21 23 25 26 23
 [5641] 17 22 21 19 26 20 23 22 21 24 21 21 23 21 23 22 23 22 25 19 23 21 25 21
 [5665] 22 22 20 22 23 23 23 24 23 25 24 23 22 21 22 21 22 21 19 21 19 22 22 28
 [5689] 24 23 22 23 23 24 24 24 23 21 22 21 27 23 25 24 20 20 23 27 21 22 21 22
 [5713] 24 21 23 26 23 20 24 20 21 22 21 22 24 23 22 23 22 24 22 24 20 20 20 23
 [5737] 24 18 22 24 16 21 20 22 21 24 26 23 18 25 23 23 21 21 20 17 20 20 25 21
 [5761] 22 21 21 19 25 24 18 23 22 22 25 22 24 23 21 23 21 22 21 24 20 22 24 21
 [5785] 18 20 26 28 24 22 23 22 19 23 21 24 25 18 23 24 21 23 21 23 21 19 23 21
 [5809] 24 23 18 24 28 21 26 24 22 25 27 23 21 25 22 17 23 18 26 22 21 22 24 23
 [5833] 22 24 21 20 24 20 18 20 23 19 20 22 20 21 23 15 19 22 21 19 25 22 22 22
 [5857] 24 22 20 21 21 21 19 26 22 20 24 18 24 24 20 19 20 19 23 23 22 23 20 26
 [5881] 24 23 23 23 21 21 25 24 22 25 23 17 21 18 20 24 19 28 23 22 22 21 19 23
 [5905] 22 20 24 22 26 23 22 21 24 23 24 23 24 16 20 21 24 21 20 22 23 17 26 20
 [5929] 20 24 23 19 22 26 27 26 24 22 22 25 25 22 21 19 18 13 24 22 20 22 23 24
 [5953] 21 19 23 26 21 27 25 21 22 21 17 23 20 28 23 22 18 19 18 19 19 24 21 21
 [5977] 23 22 23 21 23 20 20 22 21 20 24 23 22 29 20 23 22 19 23 23 24 22 20 23
 [6001] 20 22 20 25 24 15 21 22 24 22 22 21 23 22 22 19 22 19 25 21 18 22 25 24
 [6025] 24 22 22 25 27 23 29 23 24 20 26 20 21 21 23 22 24 23 24 20 23 26 22 20
 [6049] 23 21 22 20 23 20 21 19 22 21 23 19 24 24 23 21 19 27 22 21 20 21 22 22
 [6073] 20 21 20 20 21 25 24 26 22 21 23 19 22 24 23 24 23 22 20 25 23 22 21 19
 [6097] 22 21 20 22 23 23 25 25 22 21 26 24 21 20 19 24 16 23 23 23 20 22 21 24
 [6121] 20 23 23 17 25 19 20 23 24 20 26 22 23 26 22 20 26 18 21 19 16 21 21 24
 [6145] 22 24 25 25 21 17 20 22 21 18 17 23 18 19 22 20 18 24 19 23 22 27 19 23
 [6169] 22 21 22 21 21 23 21 22 24 21 20 21 26 22 25 26 21 22 25 23 21 24 20 19
 [6193] 23 26 16 19 21 21 22 23 21 21 18 21 20 22 20 22 19 26 26 19 25 23 23 24
 [6217] 25 23 27 24 23 27 25 25 14 19 23 23 22 21 22 21 22 21 24 21 24 22 22 20
 [6241] 21 22 25 22 21 22 22 23 25 20 23 22 28 20 19 21 22 25 19 22 21 22 22 27
 [6265] 23 18 20 25 20 24 21 21 26 21 23 24 24 22 24 20 22 21 24 20 25 23 24 21
 [6289] 24 20 20 21 21 20 27 21 21 23 22 25 22 18 22 22 23 23 27 19 23 23 21 19
 [6313] 24 22 26 24 20 22 24 24 23 22 21 19 23 22 20 21 25 22 21 25 21 26 22 24
 [6337] 25 28 24 21 23 21 20 24 19 19 20 22 21 18 20 24 27 23 24 25 27 20 19 22
 [6361] 19 21 24 17 25 25 24 23 22 26 24 18 25 20 18 25 23 23 25 22 23 24 22 22
 [6385] 24 20 26 22 25 25 21 24 21 25 19 22 22 23 24 25 21 26 19 27 25 23 19 21
 [6409] 24 23 24 22 24 19 22 24 25 19 22 23 25 26 19 26 21 23 17 20 21 23 23 23
 [6433] 23 23 23 21 21 23 23 19 22 24 27 23 20 26 22 22 21 28 21 21 21 22 24 20
 [6457] 24 23 24 20 25 23 21 26 24 22 20 22 20 20 23 21 23 17 20 21 21 23 23 17
 [6481] 22 25 22 24 25 17 24 23 21 18 21 20 23 21 22 21 21 20 19 21 22 22 22 23
 [6505] 22 19 24 27 24 22 25 19 22 27 22 14 22 17 20 22 22 18 25 21 24 19 22 24
 [6529] 20 25 21 19 21 22 19 22 21 22 25 16 22 24 23 21 21 23 21 23 20 19 20 22
 [6553] 21 21 19 23 24 23 23 22 20 21 22 19 20 23 22 22 23 21 26 17 22 21 24 23
 [6577] 27 21 24 21 21 21 19 21 21 21 20 23 25 21 26 27 21 21 26 20 22 18 18 19
 [6601] 23 22 24 24 21 20 22 22 21 20 18 26 22 23 21 25 24 21 22 23 19 24 22 25
 [6625] 25 27 23 19 24 23 26 24 24 21 23 24 24 21 23 23 21 24 22 18 23 19 21 21
 [6649] 19 25 25 22 24 20 21 20 20 21 23 21 21 23 19 21 28 19 23 22 24 17 24 22
 [6673] 24 21 22 19 24 20 21 22 20 26 22 25 20 24 28 25 20 20 23 27 23 22 16 26
 [6697] 24 20 20 23 23 20 23 23 22 21 21 23 21 22 21 26 24 22 26 26 21 24 24 28
 [6721] 20 18 23 21 20 19 24 17 18 20 21 24 18 22 16 21 22 23 23 23 23 20 20 23
 [6745] 26 20 22 24 20 23 24 24 26 25 22 22 16 22 24 24 22 25 23 23 22 23 20 23
 [6769] 26 26 21 24 22 18 21 19 25 22 21 19 24 25 23 19 19 18 22 22 23 19 20 22
 [6793] 23 21 22 23 25 23 30 21 21 23 24 24 24 23 23 24 20 22 24 24 23 21 19 23
 [6817] 23 22 21 28 24 22 23 23 17 23 28 23 24 22 25 25 18 25 20 22 20 25 22 24
 [6841] 23 21 22 19 26 24 25 22 22 21 23 21 23 23 20 26 20 24 19 21 20 20 18 26
 [6865] 22 22 20 24 26 23 23 24 21 20 24 20 24 21 23 24 17 20 26 24 20 19 20 19
 [6889] 22 27 22 16 24 23 22 23 23 23 22 25 24 26 21 22 21 21 21 24 25 23 20 19
 [6913] 23 26 25 23 22 20 24 25 21 20 20 26 25 22 23 22 20 24 20 19 23 21 17 23
 [6937] 19 20 25 25 20 24 25 22 20 22 23 19 23 22 20 21 23 25 22 22 24 21 26 21
 [6961] 23 18 21 26 23 16 20 23 25 22 20 22 24 25 22 17 24 23 24 21 27 20 23 22
 [6985] 21 21 21 24 25 24 25 22 19 21 19 25 20 17 22 24 21 22 23 20 18 20 21 22
 [7009] 19 24 20 19 22 22 18 23 27 27 20 23 25 22 23 20 22 21 18 26 21 19 24 24
 [7033] 23 23 20 25 20 21 25 22 21 20 22 21 23 27 21 27 22 26 22 25 23 21 20 23
 [7057] 21 23 17 24 19 20 24 24 23 28 22 26 22 21 23 22 23 20 23 18 19 21 20 23
 [7081] 22 22 19 21 23 20 25 21 22 23 19 20 23 27 22 20 25 20 20 22 25 25 21 18
 [7105] 20 22 21 25 22 20 20 28 22 24 22 24 23 24 18 26 25 22 20 26 21 25 26 24
 [7129] 23 22 22 20 19 25 21 21 22 21 25 22 25 23 24 21 20 18 22 20 22 22 22 22
 [7153] 19 25 21 24 20 26 23 24 24 22 19 21 24 22 25 22 23 22 22 24 22 21 25 19
 [7177] 22 23 26 18 26 21 20 26 22 20 25 24 24 21 22 20 20 26 19 21 19 21 23 21
 [7201] 18 23 20 26 21 21 26 24 22 25 23 21 24 24 20 19 23 21 20 23 20 23 22 24
 [7225] 17 21 20 23 23 21 23 21 20 21 17 20 23 25 23 22 19 17 18 22 22 20 24 22
 [7249] 21 23 17 18 23 22 19 20 23 19 26 18 23 20 24 17 22 25 20 24 28 19 22 19
 [7273] 22 27 24 19 22 24 20 21 20 20 20 23 27 23 22 21 26 19 22 23 20 26 23 22
 [7297] 19 21 19 21 22 18 28 23 18 26 20 23 22 20 17 23 23 22 24 21 25 25 20 22
 [7321] 22 21 23 20 25 22 18 20 29 23 19 21 22 29 23 22 22 21 23 19 22 23 23 22
 [7345] 15 25 21 24 21 25 22 22 27 19 20 21 21 22 19 20 24 25 21 18 25 23 23 20
 [7369] 19 22 25 23 22 17 20 18 22 23 22 24 21 22 22 19 21 22 20 25 21 21 22 21
 [7393] 23 23 23 23 23 23 18 22 23 23 21 20 22 23 20 19 25 22 18 19 23 23 22 21
 [7417] 18 26 20 17 21 20 22 25 26 18 18 22 24 19 27 18 26 24 22 19 22 26 25 20
 [7441] 20 22 24 19 21 21 20 20 22 21 19 20 18 24 20 24 23 19 22 19 24 23 20 21
 [7465] 23 23 22 23 22 17 28 28 24 25 24 17 27 27 22 25 20 24 23 23 22 26 20 19
 [7489] 21 20 24 20 26 23 23 23 20 23 25 25 25 22 19 24 21 20 24 22 19 23 18 19
 [7513] 26 24 24 25 23 20 20 20 18 26 22 20 19 24 23 19 23 18 24 27 24 29 20 22
 [7537] 24 22 18 23 25 24 23 21 19 23 25 25 24 23 21 21 23 22 20 24 25 21 16 21
 [7561] 21 25 22 24 20 23 22 18 20 22 20 25 21 22 26 23 24 25 19 22 21 24 20 27
 [7585] 22 24 23 19 23 23 26 21 25 23 19 23 20 22 18 22 20 22 23 19 21 24 22 25
 [7609] 22 20 20 20 21 21 23 23 25 21 21 21 23 21 19 24 21 18 20 22 23 21 24 24
 [7633] 21 20 22 23 18 23 26 20 22 26 23 21 20 19 26 22 20 23 22 22 20 18 23 20
 [7657] 19 20 25 25 22 23 22 25 23 21 24 18 23 23 24 23 20 21 19 22 24 21 23 19
 [7681] 21 28 22 22 22 25 20 22 19 23 22 26 24 22 19 18 22 22 23 18 22 21 21 24
 [7705] 22 21 24 24 22 25 23 26 25 22 19 23 25 21 21 17 17 23 19 23 25 22 23 23
 [7729] 20 21 24 22 21 22 21 20 21 22 19 23 23 20 22 22 20 22 21 21 20 23 25 24
 [7753] 21 20 21 23 21 23 19 21 21 26 23 28 24 21 24 19 20 24 24 20 19 23 20 21
 [7777] 23 25 23 25 20 22 20 21 24 24 25 20 21 22 21 22 20 23 20 22 21 22 24 22
 [7801] 25 23 21 22 23 25 21 20 20 25 22 23 21 20 19 23 22 19 25 22 19 22 16 20
 [7825] 25 24 20 20 25 20 22 22 24 23 22 27 25 16 18 21 21 19 22 19 20 23 18 23
 [7849] 26 24 24 20 24 23 17 25 19 18 23 19 25 21 22 25 25 22 25 18 19 23 19 23
 [7873] 27 22 20 19 24 20 24 23 23 20 24 23 20 23 22 22 22 24 20 21 22 21 23 19
 [7897] 24 21 19 23 26 23 21 21 22 21 21 21 20 22 23 19 26 20 23 23 22 23 20 20
 [7921] 24 21 21 23 23 24 23 25 22 21 26 22 25 23 21 24 18 26 20 18 27 24 21 23
 [7945] 22 23 22 21 23 18 26 23 24 24 23 21 28 21 27 23 23 23 22 26 22 22 22 22
 [7969] 20 21 23 22 25 23 21 22 19 18 18 19 25 21 19 22 23 23 23 22 20 23 21 27
 [7993] 18 26 17 21 23 23 21 23 17 23 22 21 22 22 24 25 26 22 24 25 21 25 27 24
 [8017] 23 23 20 23 22 25 24 24 19 21 24 19 24 20 22 23 25 20 20 23 20 22 19 25
 [8041] 21 22 21 21 21 22 22 24 20 25 23 24 24 22 19 22 20 23 25 20 23 23 19 17
 [8065] 17 21 22 22 25 22 24 23 25 25 25 25 20 18 19 22 24 25 23 21 21 23 22 22
 [8089] 22 25 21 22 23 23 18 20 22 26 23 26 20 22 19 19 25 26 25 23 23 21 23 22
 [8113] 19 21 22 25 19 28 21 19 19 22 25 26 23 22 25 22 22 26 24 23 23 26 26 22
 [8137] 22 20 22 24 22 18 22 19 20 23 21 21 18 23 23 21 22 23 19 20 23 22 23 21
 [8161] 24 21 19 24 27 19 20 23 23 20 20 24 24 25 20 22 25 18 20 19 19 24 21 23
 [8185] 21 22 21 20 22 23 22 20 21 22 22 18 24 23 25 21 23 22 18 17 20 22 23 21
 [8209] 18 19 21 24 26 19 20 25 25 23 23 22 20 19 20 27 19 16 23 22 23 24 26 21
 [8233] 16 20 21 22 21 21 26 23 21 20 25 24 21 22 25 21 23 23 17 22 21 24 21 24
 [8257] 24 20 25 21 19 21 23 24 23 20 24 27 22 19 20 28 22 24 22 25 21 21 24 21
 [8281] 28 23 22 23 19 22 25 23 24 22 24 23 21 22 20 22 21 25 20 24 25 21 26 24
 [8305] 24 21 21 18 24 26 23 24 22 21 21 21 17 21 27 21 24 21 23 18 16 24 21 21
 [8329] 21 22 23 21 23 23 22 24 21 25 21 19 22 23 18 22 22 22 20 22 26 22 20 18
 [8353] 22 25 27 22 19 20 25 22 25 24 22 22 22 24 24 24 23 24 21 19 28 24 20 23
 [8377] 21 20 21 26 26 22 24 23 24 19 25 20 22 21 20 17 27 23 20 21 23 21 19 22
 [8401] 27 19 23 23 23 25 24 23 24 23 20 23 25 26 22 25 20 25 24 23 21 23 23 26
 [8425] 22 21 24 24 19 20 28 24 23 20 20 25 24 19 22 23 20 23 22 17 24 26 23 23
 [8449] 20 20 22 23 24 21 26 25 21 23 20 22 23 23 23 24 22 26 24 21 18 24 23 24
 [8473] 22 16 22 18 21 20 23 18 25 22 21 22 22 21 25 20 21 27 21 21 22 23 23 18
 [8497] 16 21 27 27 22 19 29 21 22 18 25 25 22 18 21 27 18 21 24 23 21 23 21 20
 [8521] 20 24 24 24 26 21 20 22 24 27 18 18 24 26 20 27 21 21 20 24 24 23 21 23
 [8545] 23 23 20 22 20 28 20 21 22 27 27 26 22 23 21 19 24 19 24 23 21 19 22 23
 [8569] 24 23 23 23 25 23 21 22 20 19 22 21 22 24 23 23 21 25 24 21 23 21 23 20
 [8593] 16 22 24 26 22 22 19 22 19 22 22 23 23 28 19 22 28 25 19 25 18 26 23 24
 [8617] 23 21 22 23 20 23 22 20 21 22 16 24 21 24 20 25 20 26 19 26 22 23 22 23
 [8641] 19 23 23 24 23 25 24 24 21 23 25 22 21 21 21 23 25 23 21 19 17 21 19 24
 [8665] 21 20 22 22 22 24 18 19 23 16 24 22 22 20 20 20 20 21 20 21 24 22 22 20
 [8689] 27 21 21 23 24 22 23 25 21 24 25 23 24 23 19 21 26 22 25 27 19 22 21 21
 [8713] 26 24 23 18 19 25 24 24 23 23 21 20 20 24 22 24 21 21 23 22 24 26 21 19
 [8737] 21 22 27 23 22 19 19 25 18 25 22 23 19 23 20 19 23 23 24 21 20 26 23 23
 [8761] 22 22 24 23 20 21 24 22 20 21 20 18 24 23 22 23 19 23 26 24 21 21 26 24
 [8785] 20 22 26 23 20 24 19 22 22 17 23 23 24 21 23 21 21 19 18 20 22 20 23 22
 [8809] 22 21 23 23 26 21 25 21 22 23 24 20 24 23 23 26 24 26 24 20 22 22 25 24
 [8833] 20 23 22 24 22 22 22 21 20 26 20 24 22 24 23 23 22 23 23 23 24 24 18 24
 [8857] 21 24 21 17 22 26 24 21 21 20 21 25 21 21 19 18 17 21 21 22 21 19 22 23
 [8881] 20 22 21 25 18 20 25 24 21 21 23 25 19 24 23 20 17 25 19 21 21 22 19 24
 [8905] 23 23 24 18 22 22 20 22 26 26 18 19 21 22 22 21 21 21 23 23 21 23 23 18
 [8929] 22 21 24 22 25 20 22 22 22 21 17 21 19 18 23 23 18 21 19 18 22 23 20 23
 [8953] 23 23 24 23 22 22 21 23 22 25 19 22 19 19 21 21 23 26 22 21 24 25 21 21
 [8977] 23 20 27 21 25 27 26 26 24 20 23 22 28 26 22 23 21 21 19 19 23 20 26 22
 [9001] 21 23 21 19 22 24 21 23 23 20 21 26 21 24 22 20 18 19 20 21 22 21 23 20
 [9025] 20 21 21 21 24 24 25 24 22 22 22 21 24 22 20 22 23 21 22 20 19 22 19 20
 [9049] 23 20 18 28 24 25 16 23 23 24 25 18 22 20 23 20 22 21 22 21 21 21 24 25
 [9073] 21 21 21 21 25 27 24 21 23 18 23 24 21 20 25 22 23 17 20 24 17 24 25 22
 [9097] 19 23 20 23 20 19 23 23 22 22 20 24 24 21 24 24 26 18 20 19 21 21 25 20
 [9121] 19 18 20 23 23 26 20 21 24 18 20 22 18 22 20 15 20 22 20 20 23 20 23 24
 [9145] 25 23 25 22 20 24 23 22 23 21 29 20 21 20 20 19 28 18 26 20 21 21 21 21
 [9169] 25 18 17 23 20 24 22 22 23 19 20 23 25 23 23 24 19 20 18 22 22 22 22 24
 [9193] 22 22 23 21 24 22 25 25 19 24 24 18 23 21 20 22 22 22 21 20 25 21 25 25
 [9217] 20 22 16 23 22 22 21 22 24 20 23 20 21 26 23 26 24 22 22 25 20 20 26 21
 [9241] 23 25 21 21 25 17 26 20 21 23 23 24 27 24 26 20 19 24 18 20 22 20 20 27
 [9265] 21 20 25 24 22 21 25 22 17 24 21 24 23 27 18 27 19 22 24 22 23 26 26 19
 [9289] 23 19 19 21 21 24 19 22 24 26 24 27 23 24 24 18 20 23 21 24 22 22 22 24
 [9313] 17 19 23 22 21 23 19 25 20 25 19 20 24 21 22 25 27 19 22 26 24 27 21 22
 [9337] 21 20 25 18 20 18 20 17 21 21 24 25 24 22 19 24 21 23 21 17 19 25 24 19
 [9361] 19 18 20 18 21 20 26 22 21 24 22 20 21 18 22 18 22 21 19 22 25 22 24 17
 [9385] 26 20 22 23 22 20 18 21 23 17 23 18 17 22 24 18 21 24 26 27 24 20 19 18
 [9409] 22 20 16 22 20 17 25 23 21 19 23 24 24 26 20 20 17 20 19 18 23 24 25 16
 [9433] 21 22 22 21 19 21 24 23 22 26 23 23 24 21 20 21 18 20 20 25 17 21 22 20
 [9457] 20 19 25 24 22 22 25 19 17 24 24 22 21 23 27 19 23 24 21 19 22 19 22 21
 [9481] 25 25 23 22 22 24 23 25 23 17 19 20 22 22 25 19 20 22 20 19 22 28 20 26
 [9505] 25 21 23 23 22 22 23 23 23 20 24 19 23 23 22 23 23 21 18 21 17 22 23 25
 [9529] 21 22 20 24 21 24 27 19 20 22 23 18 22 22 20 23 28 24 19 20 23 27 27 23
 [9553] 21 20 26 18 21 20 24 18 19 23 22 19 23 20 19 24 23 24 22 20 27 18 22 24
 [9577] 21 16 24 20 25 23 23 19 22 22 24 24 26 25 25 22 23 22 21 20 26 23 19 21
 [9601] 22 24 25 20 26 20 26 21 17 20 19 21 25 23 23 25 22 23 21 19 23 22 21 19
 [9625] 20 22 22 20 24 21 29 20 24 23 22 23 22 18 20 20 23 22 24 21 26 22 19 19
 [9649] 19 22 23 21 24 21 23 21 23 23 21 21 22 23 23 25 23 23 23 18 20 19 23 23
 [9673] 24 24 23 18 23 22 21 28 24 23 18 16 18 18 19 23 23 22 26 25 24 24 22 20
 [9697] 25 24 21 21 20 19 28 23 18 24 23 24 22 23 23 24 25 20 22 21 17 21 21 23
 [9721] 19 20 27 23 25 22 20 21 24 23 26 18 27 24 25 20 27 21 21 20 23 22 23 19
 [9745] 22 22 21 23 19 26 21 21 23 17 21 21 21 23 24 25 20 24 20 25 20 20 22 25
 [9769] 22 24 21 24 21 20 19 19 21 22 19 25 26 20 23 20 19 25 24 20 22 21 25 24
 [9793] 22 22 23 24 20 18 23 26 24 22 23 23 22 20 27 23 22 20 21 20 22 19 21 19
 [9817] 20 25 22 23 21 22 23 21 23 19 23 20 25 23 21 20 19 20 20 20 25 23 21 24
 [9841] 23 23 27 23 23 24 17 22 22 22 21 24 22 16 22 23 22 21 20 18 26 24 19 23
 [9865] 21 18 22 21 23 19 20 19 24 20 22 26 18 19 23 20 20 25 19 22 24 21 22 20
 [9889] 20 24 23 19 21 24 23 21 22 22 22 21 21 25 23 21 22 21 24 18 20 20 22 25
 [9913] 24 19 21 22 26 20 21 23 21 22 23 24 25 24 23 23 23 20 21 21 23 24 22 21
 [9937] 18 19 24 24 25 24 20 24 25 20 18 24 23 20 24 21 24 18 23 19 21 23 23 20
 [9961] 23 22 22 21 20 18 22 24 17 25 23 20 25 24 22 19 23 24 18 22 22 24 23 23
 [9985] 22 19 23 20 21 25 22 21 23 23 19 20 22 22 22 21
In [122]:
matrix_3_df <- as.data.frame(matrix_3)
In [123]:
histogram_BCN_one_to_one_orthologs_skogs <- ggplot(matrix_3_df, aes(x=matrix_3))+ geom_histogram(binwidth=0.5, alpha=0.4)
In [124]:
#determine the total number of unique modules the randomized set of Skogsbergia sp. transcripts are found in. 
table(matrix_3_df)
matrix_3_df
  13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28 
   2    6   12   56  150  385  725 1196 1529 1676 1623 1172  779  431  168   69 
  29   30   31 
  17    3    1 
In [125]:
histogram_BCN_one_to_one_orthologs_skogs_prettyversion <- 
ggplot(matrix_3_df, aes(x=matrix_3)) +#+ theme_classic()+
labs(x="Skogsbergia sp Modules", y = "Count") +
geom_histogram(color ="darkblue", fill="lightblue", size=1, bins=16)+ scale_x_continuous(breaks = seq(min(matrix_3_df$matrix_3), max(matrix_3_df$matrix_3)))+
theme(text=element_text(family = "Arial")) +
theme(axis.title.x = element_text(size = 20)) +     
theme(axis.title.y = element_text(size = 20))  +
 theme(axis.text.x = element_text(size = 15)) +
theme(axis.text.y = element_text(size = 15)) +
geom_vline(data=matrix_3_df, aes(xintercept=24, color="black"),
             linetype="dashed", size =1)+labs(color = "") +
theme(legend.position = "none")
In [204]:
histogram_BCN_one_to_one_orthologs_skogs_prettyversion

References

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008)

M. I. Love, W. Huber, S. Anders, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014)

Alexa A, Rahnenfuhrer J, topGO: Enrichment Analysis for Gene Ontology (2024)

Shannon, Paul et al., Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research vol. 13,11 (2003)

In [ ]: